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In this Issue 

Multimedia capability is rapidly becoming a standard feature in today's work- 
stations. In this issue we have nine arttcJes that describe just such a work- 
station, the HP 9000 Model 712, The Model 712 is an entry-level workstation with 
high-performance features that makE it an an excellent platform lor multimedia 
tools and applications. The article on page 6 provides an overview of the Model 
712, showing how the system is based on three VLSI chips: a multimedia-en- 
hanced PA-RISC processor, the PA7100LC, a highly integrated I/O chip, and a 
high-performance graphics chip. 

Flawless execution of a product's devefopment is not the only factor that en- 
sures a product's success. Defining the correct feature set and choosing the right design methodologies 
are just as important as the schedule. The article on page 12 describes how the design team for the PA 
7100LC processor used this philosophy to guide the design decisions they made in developing the CPU 
chip, and the article on page 23 describes how these design decisions impacted the methodologies used 
to create, verify, debug, and test the processor chip. 

Low manufacturing cost was one of the main goals for the Model 712 workstation. The article on page 36 
describes how the Model 712s I/O subsystem was designed with with this goal in mind. The I/O chip, 
caHed LASL which is an acronym for the two major pieces of functionality on the chip, LAN and SCSI, 
integrates several 1/0 functions on one chip. Both the LAN and SCSI designs were purchased from out- 
side vendors and imported into the HP IC design process at the artwork and netlist levels respectively, 

Besides performance and functionality, low manufacturing cost was also a primary goal for the graphics 
chip described in the article on page 43, This was achieved by extracting as much performance and 
functionality as possible from readily available technology and integrating components such as the color 
lookup table and the frame buffer onto one chip. One of the features incorporated on the graphics chip 
is a technology called HP Color Recovery, which is described in the article on page 5T Using a low-cost 
B'bii frame buffer and HP Color Recovery, the graphics chip can display images that are in many cases 
visually indistinguishable from those of a 24-bit frame buffer costing three times more. 

The combination of software and hardware optimizations, including the implementation of a small set of 
PA-RISC multimedia software instructions enable the video player in the HP M Power 2,0 product to play 
back MPEG compressed video at real-time rates of up to 3D frames per second. As the article on page 60 
explains, this is the first implementation in which real-time MPEG video decompression has been 
achieved via software running on a general-purpose processor. The multimedia enhancements allow 
four parallel operations per cycle by partitioning each of the 32-bit ALUs. 

Integrating telephone capabilities on a workstation is a natural step in the evolution of the electronic 
office. The HP TeleShare option card for the Model 712 workstation, which is described in the article on 
page 69, represents HP's first telephony product HP TeleShare provides two-line support, with each line 
configurable for voice, fax, or data. 

The product design for the Model 712. described in the article on page 75, shows how a design with no 
fasteners and using environmentally friendly materials and low-cost parts can provide excellent manu- 
facturabiHty and customer ease of use. 

The PA7100LC processor and the LASI chip are also used in a series of low-end multiuser business serv- 
ers, including the HP 9000 Series 800 Models E23, E35, E45, and E55 and the HP 3000 Series 908, 918, 928, 
and 338. The article on page 79 gives an overview of the architecture of these products and the process 
the development team went through to meet their time-to-market goals. 
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fn today's qlobal economy users must have seamJess access to applications and data that might be 
thoysands of miies from from vvhere they are located. The article on page 85 describes a too! called HP 
Distributed Smalltalk, which pro\/ides an object-oriented envirof^ment for the rapid development and 
deployment of multiuser, enterprise-wide distributed applications. Based on the object-oriented model, 
HP OfStributed Smalltalk contains the objects that enable developers to CDnstmct applications diat pro- 
vide such things as easy access to information across the enterprise, dynamic interaction with other 
users on the network, insulation from differences in operating environments, interoperability, and code 
reuse. The article on page 93 describes an application that was built with HP Distributed Smalltalk. The 
appiication, HP Software Solution Brolcer, is a client-server system that gives HPs worldwide technical 
consultants easy access to the latest HP and non-HP software products and tools for customer demon- 
strations and prototyping. 

Two papers in this issue are from the 1994 HP Design Technology Conference, a forum for the exchange 
of ideas, best practices, and results among engineers involved in the development dud application of 
integrated circuit design technologies. ►- After trying techniques that did not provide enough information 
to track down the root cause of a failure in the FPALU ofthe PA7100LC processor, the design team de- 
cided to use 3 methodology called voltage contrast imaging to fmd the problem, Voltage contrast imag- 
ing (page 102) allows visual tracking of logical level problems to their source on operating circuits using 
a scanning electron microscope. > In many IC design centers today design for testability (DR) is not just 
an abstract goal but a necessity. The article on page 107 describes how a design team faced with the 
need to test over twenty new ASIC components going into four different workstation and multiuser com- 
puter models formed a DR team to develop a common system-fevel DR architecture so that subsystem 
parts could be shared without affecting the manufacturing test flow. 

CI. Leath 
Associate Editor 



Cover 

An artistic rendition of the interconnection between the three main VLSI chips that make up the hard- 
ware architecture for the HP 9000 Model 712 workstation. The die photos are fur the PA 7100LC proces- 
sor (top), the graphics chip (lower left), and the LASl chip (lower right). 



What's Ahead 

in the June issue well have ten articles on the design of the HP Gt600A capillary electrophoresis instru- 
ment, a new liquid-phase sample separation system for analytical chemists. We'll also have articles on 
COBOL Soft;Bench, a product that encapsulates COBOL in the SoftBench development environment, HP 
Disk Array, a fault-tolerant mass storage solution for PC networks, and two more papers from the 1994 
HP Design Technology Conference, 
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A Low-Cost, High-Performance 
PA-RISC Workstation with Built-in 
Graphics, Multimedia, and Networking 
Capabilities 

Designing as a set the three VLSI components that provide the core 
functions of CPU. 1/0. and graphics for the HP 9000 Mode! 712 work- 
station balanced performance and cost and simplified the interfaces 
between components, allowing designers to create a system with high 
performance at a low cost. 

by Roger A, Pearson 



Df'sigiYing 3 workstallon entails denning variniis rnndional 
blocks to work together to |>ro\1tle a set of reatures at a de- 
sired level of perfomianee at the lowest possible cost. Often, 
many parts of the design aie leveraged from previous di^ 
sigRS, ajid only new fmictio nullity is designed from scratch. 
Tins approach may save development costs, but couid result 
ill a product that Is more costly to build. 

When one component of tJie system design has peiformajice 
that can^t be taken advantage of, whether because of archi- 
tecture Imiitations or odier components' performance liniita- 
lifins, then I he system design suffers by having to carry the 
cost of that unused pcrfotniance. By ilesigning with tlie tot^il 
system in miiut, so thai all components of the design are 
optimized to work together with no wasted perfortnance, 
cost can l>e minim izetl The designers of the HP fJOOO Series 
700 Models 712/60 mid 712/80 took this approach to offer a 
high-performance combination of grapliics, mnltiniedia, and 
networking capabihties at nevv low piicc^s. The objectives of 
the new design included: 

• Providing I lie higli pcif omumce of a PA-RISC workstation at 
the lowest possible cost 

• Improving the perfonntmce imd capabilities of multimedia 
funt tion.s through simple extensions to tbe instruction set 

• Enal:>ling an extensive set of commimi cation features 
tluTjugh low^'ost option cards 

• Designing for high- volume mimufacturing. 

Instrumental hi meeting these objectives was the decision to 
design three new^ custom \T^I cldps together, as a set^ to 
achieve new lev^els of price/performance for tlie core fimc- 
tions of CPU, I/Q, and grapWcs. 

Overview 

Three new M^l cJiips provide most of tJie functionahty of tlie 
Model 712 worl^alioa Tlie FA TiOOir CPl^ chip ir^terfaces 
directly to the cache and main memory. The L^I (IAN/ 
SCSI) chip does most of the core I/O needed for entry-level 



workstations. The grapliics subsystem consists of the graph- 
ics chip aiwl the frame Ijuffer VRAAIs. ,\1I tliree ciiips com- 
municate tlirough the GSC (general system connect) bus* 
Fig. 1 shows a block diagram of the Model 712 system. 

The Models 712/00 tmd 712/80 are very similar and differ 
only in their cache Hues and cache speeds cmd In the mahi 
system dock speeds. 

The Processor 

T[ie conipnle power of the Model 712 system is provided by 
llie PA-RISC VA TlOObt ■ processor, ^'^ whicli is packaged in a 
4;3ii-pin ceramic PGA. The CPU design was optimized for the 
Mtjtlel 712 and includes the following featm'es: 

• Superscalar CPU 

• lK-b>1^e instruction buffer 

• Multimedia support 

• Cache control for up to 2M byt es of external oaehe 

• ECC (error correction coding) memory controller 

Tlie clock freciuencies of the Model 712/GO and the Model 
712/80 are CiO MHz and 80 Mllz respectively. Tlie PA 7100LC 
is described m more detail in the article on page 12, 

Cache 

The PA 7100LC CPU uses an external cache. An external 
cache allows system designers to change the size of the 
cache easily to meet tiieu performance and cost goals. Fur- 
tiicniiorc, off-^iiip caclie provides all the performance neces- 
sary without hmituig the CPU frequency. 

The extemaJ cache is 64 K bytes on the Model 712/60 and 
256K bytes on the Model 712/80 and is logically spilt into 
equal halv es for the instruction and data caches. Combinuig 
the caches saved pins on the CPU, To further rtMiiice costs^ 
industrj^ standard SRi\Ms (static R.'LMs ) ai^ used. T^ble 1 
shows tbe SRAMs used in the Model T12 systems, 
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Table 1 






Static RAMS Used in the Model 712 System; 




Model 


Fanction 


Size Speed 


Q'jantitv 


712/60 


Tag 


8K bytes 12 ns 


4 




Dam 


8K bytes 12 ns 


6 




Data 


8K X 9 hit.s 12 ns 


2 


712/80 


Tag 


a2K byte's 10 ns 


4 




Data 


32K bytes 10 ns 


6 




Data 


;J2Kx9bits 10 ns 


2 



Main Memory 

Tfie inain ihimikm^ for the Model 712 ^ysletns has been engi- 
nrprerl to prfmde liigh jierfonnance with industiy-standard 
TO-ns SIMMs f single inline memory modules). ( 'urgently siip- 
I>oited SIMMs are availabie in 4M-, SM-, KVM-, aJid :j2M4jyte 
sizes. Four slots are availabie and must he filled in pairs for 
a niaximuin of 128M bytes. 

The Model 712's main memory design minimizes tlie average 
caelie miss iDenalty. The main nteniorj- controller returns 
double words (oighr bytes, since a word is four bytes) baf 1< 
tci ibe t;PU. Each ( ache lint* m made up of four double 
words. When there is a t ac lie miss, the one double word <jf 
tlie four in the cache line that was niisstxl is lefened to as 
the ciii ieal word. To mininusse the miss penalty, the double 
word Ktntaining the rrifical word is sen J hack 1o the CPU 
fosl. ffjUowcMl by du' remainirtg three double wtirds. 

Bandwidth Is niaxinmed by using fast page mode wiien con- 
secutive accesses reside on Hie same page, Ttiis is often the 



case when large blocks of memoiy are accessed and is very 
conmion in windowed graplucs systems. 

The General System Cannect Bus 

'['he general system conuecl , or OSC, is the local bus that 
connects tlie three VLSI devices and the optional 1/0 card. 
The GSC. bus is designed to provide tnaxiiiumi bandwidth 
for nieniory-to-gi-at^hics transfers, *Vhe bus has 32-l)it niLJti- 
plexed address and data lines to minlnu^ie the iiumt>er of 
signals* Other features of the bits include: 

• Operation at half the CPU frequency (30 or 40 MHzJ 

• Support for 1-, 2-, 4-, 8-, lt>, or 32''byte transaction 
■ Ueurraiarlu I ration 

• Piirity generation and checking. 

Normally, bus trartsactions are tenniuated by a tiuiiaiound 
state that allows drivers to be turned off before the drivera 
for the next transaction aie turned on. To improve graphics 
peifoiTuanc^e, the bus suppf>rts back-l*>baf k writes to the 
same device without tbe tuniannuul state. Tliis iiuiiroves 
throughput on tnanslers of laige blocks of da(a frum main 
memoiy to graphics. 

During transfers from memory to 1/0, it is sometimes neces- 
sar>f to lock the CPU out of memory (e.g., when semaphores 
aie used). To facilitate this, the GSC has provides a lockhig 
mei^h^uiism, whit h prevents the ("PU li-om accesstng memory 
(to service a cache miss, for example J, 

Graphics 

The grat^hifs subsystem consists of a graphics chip and four 
on-board VliAMs (video RAMs), which provide a lQ24-by-768- 
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pixel frame buffer with a depth of eight planes ai a refresh 
rate of 72 Hz, Aii optlonaJ high-resolution Vli^^ board in- 
creases re:^ohjiioti to 1280 by 1024 iiixels. 

The graphics oliip was designed witJi the other system com- 
ponenls to provide liigli perfomiai\ce at a minimal cost. For 
more infonnatiot^ on the grapltics chijK see refert^nce 3 and 
the article on page 43. 

Built-in I/O 

The Model 712 features a number of built-in I/O devices that 
are intended to address the needs of tlie nt^oriiy of users. 

Support for tliese functions is provided largely by tlie LASl 
I/O VLSI chip. LASI is a higlily ijitegnitcfl clup that ijrovides 
a signifit^iint reduction in system cost and increasetl reliabil- 
ity The ciiip is packaged in a 240-pin MQL-AD package. The 
LASI chip is described in more detail in the article on page 
36 and in reference 4. 

The following sections briefly describe the lASI cbip's 
built-in capabihties. 

IEEE 802.3 LAN. L4SI contams an Intel 82C596 megacell 
whit It wi^LS ported to work with HP's K* process. The LAN 
transceiver, wiiich was not practical to include on LASL is 
loaded on tlie printed circuit bo^u'd. Tlie transceiver inter- 
faces to both the ALH (atliichimHit luiil int.erftice) and Ether- 
twist media, 

SCSI. The Model 712 uses an S-bit singie-ended SCSI inter- 
face for the o]3tional interna] hard drive and external perijDh- 
erals. The SCSI-2 interface is implemented entuely \\itliin 
LASI through a megacell that was designed by IIP and NCR. 
A nelhst for the NCR 53C710 was imported uito I IPs design 
envuonment. The design was tlien tuned to work in HP s IC 
process. 

By keeping the SCSI bus stub lenglJi to a minimum on tLie 
printed circuit bomd mid on the connection to the optional 
internal chive, SCSI termination on the internal side is 
greatly sin tph Tied. Short stiib lengths allow ihe Ivits to be 
tenninated on tbe printed circuit board, whether Ihe op- 
tional hitemal drive is preseni or nor. This saves cosi by 
obviating the need for s|)ec ial terminators which would 
otherwise have to be enaliled or disabled (manually or elec- 
trically), depending on the presence or absence of the ojj- 
tional infernal drive. 

Audio. I6-bit CD-quality audio playback and record capabiht^^' 
is proviiled by the audio circuit r>^ wlufli ct.msists of a Costal 
Semiconductor CS4216 CODEC and suppoiting circuitry. The 
LASI chii> also includes the serial interface to the CS4216. 
Headphone, microphone, and litie-in connectoi^s are located 
on die real- panel. St^andard sanijjling rates huiiKie S, 44,1, 
and 48 kHz. 

Real-Time CiDck. A real-time clock is designed into the f^ASI 
cliip. Battery^ backup keeps time wlule the workstation is 
powered down, 

PS/Z There ai'e two PS/2 comiectors on the rear panel diat 
allow connection to a lo%v-cosi industrj^-standard keybo^u'd 
and mouse. Tl\e PS/2 interface cucuitiy is integrated into 
the LASI chip. 

RS-232, An RS232 interface has also been designed mto the 
LASI chip. The Model 712 buffere the signals with a ^LAXI]M 



211 to provide an RS-232 sexial port, LASI buffets inbotind 
and outhound data with 16-byte FIFX)s, at baud rates from 
50 to 454 kl>its/s. 

ParalleL The LASI chip aLso pro\ddes a parallel port conforming 
iJO the Centronics industry standard. 

Flexible Disk Support. A Western Digital UT)37CGijC flexible 
disk controller interfaces L\S1 to an opiional intentd i)er- 
sonal-computer-style flexible ciisk di ive. 

Flash EPROM. An 8-bit bus on tfie LASI chip is demultiplexed 
by two 7 iCTlT374 latches to provide the address and data 
lines necessar>' to address tlie two I28K-byte Hash EPROMs 
that contain the boot fuinware. The flash EPRf )Ms are also 
used to store conflgiuation parameters, ehminating the need 
for an E EPROM and its associated cost. 

I/O System Support. LASI provides a number of miscellaneous 
I/(J system support functioi^s^ including: 

• Clock generation. LASI derives all the necessaiy clocks re- 
quired by the I/O circuitry from the main system clock. It 
dtses so by using simple divide-by-n comiters and two digital 
phase-locked loo])s. 

• System arbitration support. LASI arbitr'ates GSC bus re- 
(luests from the I/O devices wtliin LASI, as well ^is from the 
CPI' and optional expansion card. 

• hitemipt support: LASI iilso provides and manages external 
inleiTupt capability for the various 1/0 devices. 

Optional I/O 

For those useis who need functionahty beyond tliat provided 
by the built-in I/O, the Model 712 mciudes two personahty 
slots tJiat can be configm'ed with a variety of otJier I/O fimc- 
tions, Tiie first of these slots is referred to as the expansion 
slot and includes a connection to the GSC bus. The second 
slot provides a connection to the serial audio stream^ and 
is intended for telephone functions. This slot is called the 
lelf^phony sloL 

Expansma Cards, Expansion cards me optional cards that 
connect directly to the GSC bus to provide a variety of other 
I/O functions. 

Since lASl has a configurable address space and can be 
configured as an arbitration slave, niaiiy of tjie expai^sion 
eaixLs rely on a second LASI chip to implement much of theii' 
functionality. 

Tlie following optional expansion cards are provided for tlie 
Model 712: 

• Second serial port. The second serial port card uses its owti 
LAS! chip and support circuitry identical to that on the sys- 
teni board to provide an additional RS-232 port. 

• Second LAN AI '1 iuul second serial interface. This carci also 
uses a LASI cliip and circiiitiy similar to that on the system 
b£)ai-d to add an additionaJ IEEE 802J3 L\N with an attach- 
ment unit interface ( ALiI) and a second RS-232 interface, 

• X.25 and second serial mterface, A Motorola 68302 niulti- 
protocol processor interfaced to the 8-bit bus of a slave 
LASI provides X.25 networking to a 25-pin X. 2 Ibis port for 
speeds of L2 kbils/s to 19.2 kbits/s, Tbe second RS-232 se- 
rial interface Is implemented in die same faslilon as die 
other cards. 

• Second display, A second display can be added to the sys- 
tem witii the second display card. This card duplicates the 



8 April 1995 Htrwletl-Packard JomnaJ 



)Copr. 1949-1998 Hewlett-Packard Co. 



Model 712 SyaiBBi Board 



Optional Tefephofiy Bo^nl 



^^^S 






mimi^^ I 




^ Headphone 
mj>B4— Microphone In 
Line In 



graphics ftuulioiiaJiiy thai is cilreacly built into the system 
board by repiiratlng tiie graphics chip ajid its supporting 
circuitry'. 

• Token Ring/SK)()0. The Token Ring/!:iO(iO card provides IEEE 
802.5 LAN functionality througli the use of a Texas Instru- 
ments token ring controller chip aiKl a custoiu ASIC that 
provides the GSC inteiface- Inshielded and shielded 
twisted pair connections are provided at data rates from 4 
Mbits/sto lOMbits/s. 

• Second display and second LAN ALlI/RS-232. This option 
combines ihe features of the second graphics display and 
tlie second LAN ALIL/RS-2;J2 options. Since tlie circuitiy for 
this option would not fit on a single expansion slot card, 
some of the circuitry resides on a daughter card that is con- 
nected to the ext>ansion slol card. The daughter card gets 
power and uiecbanical support through the telephony con- 
nector^ so wlien this option is uistalled, tiie telephony option 
is not available. 

Telephony. The telephony card installs in the telephony slot 
and provides two lines of telephone access. Each of the 
luies can be configuied to support voice, data modem, or fax 
modem. 



Phone 
Ufi«1 



Fig. 2, Block diagram of tJie 
Mcidel Tl2 audio and telephony 
circuits. 

Tlie system boards headset and nucrophone serv'e as the 
human interface for voice telephony, mid an interface chip 
on the telephony card called XBAR links the system board's 
audio circuitry to the telepiuiny fimctions (see Fig. 2), 

This aiTangement allows recording and playback during 
telephone conversations. li also supports digital mixing of 
microphone, line- in, telephone, and prerecorded audio. 
Caller-ID decoduig is supported, as are DTMF (dual-tone 
multifrequency) encoding and decoding, and dual -line 
conferencing. 

The XDAR chip serves to route infonnation tjetween die 
LASI I/O chip, the audio CODEC, and Ihe DSP blocks in a 
vaiiety of progranunal)le ways. Data is transferred to and 
from the system Ijoard tlirouglt t wo serifil data paths. Two 
adflihonal serial paths serKi and receive data to and from the 
DSPs, "tVo 8-bil parallel poits are used liy Ihe DSPs during 
the DSP boot iirocess. XBAR has a few other functions, in- 
cluding receiving hic^oming phone rings and controlling 
phone line hcmk status. 

Each DSP subsystem consists of an Analog Devices 
ADSP2101 processor and 32K by 24 bits of external 20-ns 




Fig. 3. Th(^ Model 712 .-system boaru. 
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SRAM for DSP programs and data. Each processor has two 
serial ports, one for XBAR and the other for the i!\jialog De- 
vices AD28nips0i aiuilog front end (phone ( ■ODEC). Each 
phone CODEC conn eels to a slant lard t%vo-wire telephone 
line thtfuigh a SUit^on Sysienis hicon>orate<i 7'JMH002 data 
access anangemerit, wliitii provides the isolation circuitry 
required by commtuucations regulatory agencies. 

The tek^pliony card is described in more ckH ail in the article 
on page 69. 

Printed Circuit Board Design 

The Model 712 system contau^ a single printed cu'cuit boaid 
called the system board. Fig, 3 shows a photograph of tiie 
system board. Ttie system bomxl supports aU the functioniiiity 
of tiie Model 712 system except for the optional boards and 
peripherals. 

The system board is 10 layers deep, luixI h^is 0JJO5-inch 
traces and spaces, it measures 1 1.4 inches by 5.6 niches iuid 
uses double-sided surface moutii technology. 

The board construction sho\^Ti in Fig. 4 was designed with 
the printeti circuit board vendor to ensure that the least 
costly materials were t:*lK)sen to obtain tlie necessaiy electri- 
cal parameters* Although il is designed Xo exJubit specific 
trace impedances, the blank printed c^ircuit board is not a 



COntroiled'inipedance design, which saves cost. The finished 
board size is optimized to make the best use of standaiTl 
subpanel sizes used by the ijrinted circuit bfjard v end or. 
Ahltough the boaid does use OiJOS-inch traces and spaces, 
these TUinimum geometries ate used only when necessary. 
Whenever possible, less aggressive routing is lused to help 
with board yield and to keep down the cost of the board. 

The design of the bkmk printed circuit boaid presented a 
number of technical challenges and some cost-saving 
opportunities. 

Performance Challenges. The Hock tuid cache 1 ay o tits pre- 
sent t^d some very speci;il chtilkntges in designing tiie printed 
circuit board. 

Fig, 5 shows a sirnplined l)lock diagram of the clock circuit 
used in the ModcM 712. All KCL circuitry Ls powered from the 
Vcv supply, and all clock receivers in t he VLSI arc designed 
to operate at these shifted ECL vtjltage levels. Tliis saves t he 
cost of additional supply voltages and level translators. The 
master cioc k is first Ijuffererl imil multi|>le cojnes ;u'e routed 
to the receiving VLSI. Tiiis way, the delay to each device can 
be independently controlled to jninimize rltjck skew and 
maximize system i.yerfo nuance. Clocks aie all routed on 
inner layers, where propagation delay is better controlled 
Ijecatise of the trace's striphne nature. The clocks are driven 
as differential pairs and aie routed to each other to mmi- 
mize differenriaJ noise generation and susceptibility^ The 
clock circuitry also features an mteresting termination 
scheme. This pi-termination network is dc^signed to approxi- 
mate the same load iis other more traditional temiinatitjn 
schemes. Ilowevf^r, il has the advantage of using zero supply 
current lUKi fewer parts. 

Fig. 6 shows a conceptual representation of how the cache 
is routed. Tlie cache hue is routed to minunize cache ad- 
dress drive delay. Tills ^ur^cuigemcnt also ciUs down on the 
number of'\ias and midntidns an imbroken gi'oimtl [ilane. 
A<ldress lines are routed from the CPL^ to the fimt via .split 
on inner layers, where die impedance Ls close to half that of 
tlte outer layers. This is to better match the impedance of 
tlie traces on the tw^o outer layers, w^liich are essentially in 
paralleL 

EMC and EMI Control In addition to more traditional methods 
of EMC' ajid KM I t onirol the ModeJ 712 system boaid iLses 
features built into the blank printed circuit board to mimic 
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the fmictionalitj^ of equivalent di.serete designs. However, 
since they are built into the printed circuit board their 
benefits jue essentially free. 

Small spark gaps are placed near many of the connectors to 
help control ESD. These spark gaps are simply very small 
trace segments separated at mminium geometri€\s to pro\ide 
a shunt path for ESD energy from signal to ground. 

To control RFl, the printed circuit board makes use of a 
number of buried capacitors. Buried capacitors are essen- 
tially small capacitors wliose plates are ail or part of the 
printed circuit board s signal or groimd layers. Tlie dielcclric 
material nl' the printed circuit board ser\^es to sepimite the 
plates of the capacitors. Each power plane is effectively 
bypassed to ground by placing a ground plane hi close prox- 
imity lo it. Fun henuore, sumt^ signals are jilso byi)assed to 
ground wilii small tniried caiiat itors to shunt uttwattted RFl 
cm^rgy tf) ground 

Conclusion 

By taking tlu-- approach of designing from the ground up, the 
Model 712 hardware designers have optimized each pan of 
rlie design to work together f o i>rovi(ie oiitstanding jierior- 
mance al ver>' low cost. Desigiiing the VLSI components as a 
set balanced performance tmd cost and dso simplified I he 
interfaces between tlie tlevices. By builditig in the features 
wanted by most customers and making less common fealures 
available only on low'-<-ost option boards, the systetn cost is 
ntiniinized for most ciisbmiers. 

The Model 712 system jjerfomnutce is simuuarized in Table 
II. 
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The PA 7100LC Microprocessor: 

A Case Study of IC Design Decisions 

in a Competitive Environment 

Engineering design decisions made during the early stages of a product's 
development have a critical impact on the products cost time to market 
reliability, performance, and success. 

by Mkk Bass, Patrick Kiiebel, David W Quint, and William L. Walker 



In today's competitive microprocessor markel, successful 
design teams reaiize Uiat flawless execution of product de- 
velopment and delivery is not enough to ensure that a prod- 
uct wiO succeed, They understand that defining \Ue correct 
featiu-e set for a prrxincl anti creating design methociologies 
appropriate to imp le in en I tmd verify that featnre set are just 
as important as meeting the product schedule. 

The flesign decisions that engineers and managers make 
while defining a new product have a critical impact on the 
product's costi thne to market, reliabihty. perfortiKmee, tutuj e 
market demand, aJifl (tltiinate success or failuie. Engliieers 
and managej-s must nuike trade-offs based on these factors 
to tlecide which features Ihey should implement in a new 
product and whic^h they should not. Fiulhcr they must plan 
their product development effoi t so that the methodologies 
by which they develop their product are sufficient to ensure 
dmt they are able to hnplement Uie pr*ockrct dcfmition witlmi 
the required cost, schedule, and performance constraints. 



Design choices arose frequently while we were defming and 
implementing the PA 7100iiC microprocessor.^ We were tar- 
geting the PA TIOOLC to be the i>rocessing engine of a new 
hnc of low-cost, ftmctlonally rich workstation imd sen^er 
pi'oducts. Om' design goals for t he CPl' were lo provide the 
system peifoiTtiance required for ouj' Uiigel market at an 
aggressively low system cost and to deliver the CPU on a 
schedule tluit would not delay what was to become HP's 
steepes} computer system production ramp to date. Fig, 1 
shows a simplified Mock diagram of the PA TIOOLC 
processor 

To meet these goals requhed tliat we sometimes hatl to sliift 
our focus from the CPU to the impact of a particular feature 
upon p erf onn mice and cost at the syslen^ level Hewlett- 
Packard's position as a vendor of both microprocessors and 
computer systems allow^ed us to use tiiis technique w r1 it 
mrrch success.-'^ Even with this focus, iiow^ever, the cor rect 
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decision could be far from obvious. We often identified se\ - 
eraJ alternative inipleineniations of a particular feature, 
eac^h with its own tnijiact on cost, schedule, and perfbr- 
niance. Trading these impacts against one another proved 
very challenging. Design decisions also irnpacte<i each other, 
\\ith the ouiconie of one sening as a critical input to others. 
The effetn of a decision, for this reason, was sometimes 
much larger than would have appeared at fii^t gJaiiee. Some- 
times decisions created additional requirements, either for 
new features or for new support methodologies. All of these 
factors played tcjgetlter to underscore the fact dial it was 
critical to our product s success to have a decision process 
that worked well. 

We knew* that a good definition of the PA 7100LC would 
require that we make feature decisions in several areas, 
including: 

• Cache oi^aiiization 

• Number of execution imits and superscalar design 

• Pipeluie organization 

• Floating-point functionality 

• Package teclmology 

• Degree of integration 

• Multimedia enhancements. 

We also knew that we needed to select de%elopment ineth- 
udologies consistent vtith the feature decisions thai we 
made. Product featm'es and required design methodologies 
are often strongly connected. We couldn't consider the bene- 
fits of one without the costs of the other, aiid vice versa. 
Methodologies that were impacted by our feanu^e-set deci- 
sions included; 

• Syntltesis 

• Place aiicJ route 

• Behavioral simulation 

• Presilicon lunc^tional verification 

• Postsificon functional and electrical verification 
■ Production test. 

These metliodolo^es are discui^ed in the article on page SX 

The cumulative effect? of our decisions led to tlie creation of 
a low -cost. single-<hiiJ jvrot^essor core tJiat includes a liuill-in 
memory controller, a comhined. \ariahle-size off-chip pri- 
mary instruction mid data cache, a IK-byte oii-chii) insi ruc- 
tion buffer, and a superscalai" execution rniil witli iwo intt^ger 
units ajtd one noating-t>oint unit. We reduced the sizt* and 
l>erformatM'e of the Hoaling-point unit, which v\'c had lever- 
aged frtjm the I*A 710(1 processor.^' ' We added limq, s^miple- 
on-the-fly. and debug modes to enhance teslabiiiiy. reduce 
test cost, and accelerate the postsilicon scrhcdule. We tai- 
lored the methodologies liy wliich w^e crealeil llie chip to 
match tJte featmes that we had decided upon. 

Tills article provides examples of our decision-making pro- 
cess by exploring I he decisions thai we nuide for several of 
the ft»atiues listed abtAT. In each case, we present the alter- 
nati^es that we considered, the costs and benefits of each, 
iuid the impact on other feaUires ajid methodologies. We 
discuss our decjsif>i^ criieiia. Since we strive hj continually 
improve our ability to make good design decisions, we also 
present^ wherever possible, a bit of hindsight about the pro- 
cess. In most (*ases. we still believe that we selected the 



correct alternative. Howe\^er, if tiiis is not the case^ we dis- 
cuss what we have learned and the niodific"ations we made 
to our process to incorporate Ibis new knowledge. 

The Design Decision Process 

Most tlesigi^ decisions ultimately come down to trade^iffs 
between rosi, schedule, imd performance. I'nfortunately. it 
is often difficult to detennine the true cosL schedule, or 
performance for the mde variety of iniplementations that 
are possible. And since these three factors most often play 
a^inst each otfser, it is necessary lo make sacrifices in one 
or two of the areas to make gains in the others. 

The ci>st of a processor core is determined by the cosi of 
sihcon tiie, packaging, wafer testing, im<l exientai SR-^il and 
DRMl. Breaking tiiis down, we find that cost of a die is de- 
temuned by the initiai wafer cost and the defect densit:^' of 
the IC process being used. Wafers are more expensive for 
more advanced processC-S be<ause of higher eciuipment, 
development, and processing costs. Tlie die packaging costs 
are detennined primarily by package type and pin coimt, 
L£u'ge-pitiout pat^kages can be very expensive. An often ig- 
nored cost is the tester tune required to detennine tiiat a 
mmnifacrured pait Ls fimctional. Reducing the time needed 
for wafer and package testing tiirectly reduces costs. Finally, 
SRAM and E^R.AM costs are detennined by the number, size, 
and speed of the parts needed to complete the design, 

Tlie schedule of a i>rojeci: Ls detennined by the complexity of 
tjie design and the abibtv' to leverage previous work- Each 
design feat me requires certain time investments and has 
associated risks. Time is required for prelitntttar>' feasibilit>^ 
investigations, design of control algorithn^s, implementation 
of ci rcuits, and presiMcon mu i p os t s j I i ( ( ) n ve ri fi rati on. 
Schedule risks include underesliniMtif jn cjt lime requii'ements 
because of unexpected complexity and tjie extra chip turns 
required to fix jjostsllicon bugs associated with complex 
design features, 

Perfoimance is conceptnially simple, but because of tJie in- 
tricat y of processor design it is r>ften difncttlt to measure 
wiihon tactual i>rototvpes. IIP has invesled heavily in peifor- 
mance simulation and analysis oi' its designs. Results from 
lIPs system performance lab were invaluable in making 
tiumy (jf the design decisions for the PA 710()U\ By support- 
ing a detailed simulation nuxUH of eacii processor developed 
by HP, tlie system fierfonnance lab is able b» |)ro\ide quick 
feedliac'k about proposed changes. HP dso uses these mod- 
els after silicon is received to help software developers (es- 
pecially for t-ompilcrs and operating systems) detemiine 
bottlenecks that limit their periormance. 

Engineers ai the system |KTfonnant f lab design their [jrwes- 
sor sinmlators in an object -on en led language to allow easy 
leverage between implementations. All processor features 
that affect perfonnance are modeled acciuately by close 
teamwork liet ween t lie perftjnnance modeling groups atul 
the hardware tli^sign grcnijis. As the hardv\'iu-e group consid- 
ers a change to ii design, the change is made in tlie simula- 
tor, and simulations are done t:o allow simple compaiisons 
that differ l\v otvly a single factor. This is continued in an 
iteralivc* fashion until all design decisions have been made. 



)Copr. 1949-1998 Hewlett-Packard Co. 



Ajiril ttH.ir» I rrvvlt'it P:r ki^rfl .U.uimd 1 3 



« • if 4 • « t # « V # 4 * « 




i 




1 


::%¥:W:WSA^ 




:::= 


y^^^^^'^im: 


S 



Ollflmi 

SA«PLf 




Fig. 2.* (a) 4;J:^-ijiti ixTiunu-: plii ^rin .lETay (4;J2-C:PGAJ, (Ij) A 24U-iJni MqUAD ornl (c) a JU4-i>iii MQUAIX 
* The CPGA package ismantifactured by Kvocera Inc and the MQUAD patikagesare manufatiured bv Oim Inlsrcanneci TethnJjtugtKS 



aftpr wjiirh wp }\re \el\ with ii .simiilaior thai maiches Liic^ 
hai'dwan' To ht^bnilL 

Without perfomiance simulations, it, would he very difficult 
to Pstimati^ pc^rfomiaiice for a i>ropose(] inipleniforatioji. 
Even somt^tJiing ss simple as a rliange ii) operating trequenry 
has effects Ihat me difficult to estimate because of the intei - 
actions t:jetwcen fixed memoiy access times and processor 
features. As processor frequency increases, memory latencies 
increase, but this increased latency is sometimes (but not 
always) hifldet^ by feat ures such as stall-tm-use. Stall-f>n-use 
allows tlie processor U) rontinvie execution in the [ueseuee 
of cache nusses as long as (be daia Is nul ueetted for an op- 
eration. These interact ions make accurate iiajul eaicii Unions 
impossible, creathig a need to use siniulatioas for comparing 
many different implementation options. 

'llie perfommnce sinuilations are based on SPEC and TPC 
benctnnarks. VVbile ttiese t>erK'bniarks are useful for gather- 
ing performimce j^umbers arid itiaking conifi^u'isons. they do 
not tell the whole stoiy Many applicat iojis are not repre- 
sented by tJie benchmarks, inchiding graphics, multimedia, 
critical hand-coded operating system routines, and so on. 
When evaluatmg featiu^es related to these ai>ph cations, we 
work directly with j^t^ople in those ai eas to mialy^e the im- 
pact of any decLsions, Often this involves mialyiiing by hand 
critical sections of the code (e.g., tight loops) to evaluate 
the overaO perfonnauce gain tissociated with a feature. For 
I be PA 7100LC. this was especially tnie for the multimedia 
features. 

The abihty to quantify the impact of |)roposefI features un 
cost, schedule, and performance was paramount to 4 air 
ability to make soimd design decisions. 

liitegratmn 

The fu^t design decisions that we made were related to the 
high-level question *lIow higlily integrated should we make 
the (*hip?" Tins leti to the questioits; ShoukI we uiclude an 
on-cliip cache oi^ not? If so, how^ large sboultl it lie? If we 
have an oft-clnp cache, how should we structure it? How^ 
should the CPU cormect to memoiy and I/O? Should the 
memory controller be integrated or not? 



Tlie primary question was whetlier tlie CPU, cache, Eind 
memory system should live on a single die in a single pack- 
age, or whether we should partition this functionality onto 
two or more chips, 

Tlie trade-offs involved in I his decision w^ere numerous. Die 
cost would increase for a multichip solution. Package cost 
w^oiild vary witii the i:)m1itioning that we chose, as would 
package type and maximum pin count, liequired signal-to- 
gromid ratios would \mry with package tyi>e, which would 
either limit the signal count or require more pins (al a higher 
cost). Perfonnance, design complexity, and schedule risk 
would be greatly impacted by the partitioning decisiou. 

To sort out these trade-offs, we started with a packaging 
investigation that quantified cost, peiformtmce. and risk for 
tUfferent packaging alternatives, Tltis investigation yiekied a 
preferred package: a 132-pin ceramic pin grid array see (Fig. 
2ii). Tliis package, wit!) its large sigruil count, could accom- 
modate ilie extra interfaces required to include a inemoiy 
control lei; an UO controller, and an external cache control- 
ler. 

The memoiy controller and caclie investigations were 
tightly coupled, Peiiormance sijnulations always included 
features from both subsystenis because small changes in the 
behavior of one subsystem couki drasticidly affect t he per- 
forrtiajice ol'tiie olher. In the evul we realised Ihat the i>erfor- 
miu\ce gains brought by <m integrated n^emory controller 
enabled smallen clieaper caches without sacrificing overall 
perfonivance. This realization drove the development of the 
cache subsystem. 

Package Select iof? and CPU Partitioning. We targeted the fC 
package design witlt the objective of minimizing sysl em cost 
with little cot!ipromise in perfonnance. The customary pack- 
age for CW cMps is eitlter a quad flat pack (QFP) or a pin 
grid an*ay. The QPP is a plastic, low -profile package with 
gull- wing c<5n!iections on foiir sides. The QFP is inexpensive 
and easy to mount on a printed circuit board and has gained 
atrceptance rapidly for surface mounting to printed chcuit 
!>oards. It has the fhsad\^aiitage tliat the number of pins is 
Imiited. Put coimts above 200 are fragile and tUfficidt to keep 
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coplanar for siirfare momitnig. The pack^e also has \'ery 
limited ability to dissipate heat because the chip is encased 
in plastic. A recent improvement to this package sandwiches 
the chip between two pieces of aluminum, which can dissi- 
pate up to four watts of heat (ten watts witii a heat sink). It 
was this metal quad, or MQI'AD. that became a candidate 
for a low-CDst package for our higjvperformanre CPl". HPs 
package of choice for pre\ious CPUs has been the ceramic 
pin grid array, a complex brick of aluniinum oxide and tung- 
sten built in la,\'ers and £ire<i at 2()(}0'C'. Tlie PGA used for 
the PA 7100 processor (the basis for the PA 7100LC J was a 
504^in design that incorporated the fo Homing advanced 
features: 

• A tungsten-copper heat-condiiding slug for superior diern\al 
conductivity to the heat sink 

• Ceramic cliip c^apacitors mounted on the package for power 
byijassing 

• Thin dielectric layers that nuniniized power supply 
inductance 

• Use of 0.004-m ^ias internally (most are O.OOS-inch). 

This package perfomietl its 1 hernial and electrical duties 
ver>^ well, ijut its cost had always been an issue. 

Our strategy to develop a low-cost CPU coupled chip pm1i- 
tioning options with the packagmg options of tising either 
iw^o lowH?ost MQl- AD packages or placing a single laige chip 
in a PGA. The two-chip CPU coultl be {jhiced in one 240-pin 
and one 30 l-pin MQCAD (see Figs. 2b and 2c). The other 
alteniative was to place a larger integrated chip in a single 
4^K-ptn PGA (see Fig. 2a), Tlie first cost estimate assumed 
that tlie PGA would be f>riced similarly to the 504-pin pack- 
age. The lotal cost of both MQUAD cliips was initially 
thought to he about 75% less than the PGA estimate. Tlus 
w^ould seem to indicate tbat tbe MQUAI) would he the defi- 
nite ciuididate to meet our low-cost goals. However, 111 at 
perception changed as our investigation continued. 

We didn't ex|)ef i the MQUAD's electrix^fil perftynnance to 
niatch ttuil of the PGA because die MQl TAD we wen* consid- 
erit^g had only one layer of signals aitd no ground planes. 
Ground planes am be used \n shield sigiuil traces from each 
other and reduce induct^inces of signals and powder .supplies. 
The PGA could inconjoraie several ground planes if neces- 
sary. On the other hiuiti the MCJl AD package can only ap- 
proach the shielfling effect of the gromid planes by making 
every other lead a ground, which severtdy limits the mmiber 
of usable slgruds. Gttiitutyg a lowt^r package tJrice by using the 
MQUAD would require redesigning die I/O drivers syiecifi- 
cally to reduce rise times and thereby fontrnl crosstalk and 
power supply noise. 

The PA 7100 PGA's etectricaJ performance exceeded the 
needs of this chiiJ, so the strategy shifted to trading away 
excess pejformam-e to gain lower cost. The ttutnber of 
power and gnjurid planes was reduced to two. The design 
was also niotlified to optimize perforniance without using 
package-mounted bypasses or thin dielectric layers. The 
PGA design was reduced to four internal metal layei's with 
no bypassing, no thin dielectric layers, mid no 0.004-m vias, 
all of w^hich reduced cost compared tt> the 5Q4-pin PGA men- 
tioned above. 

The power dissipation of \lw chips woukl also have been m\ 
Issue for tlie MQl'.^s. Heat sinking to further improve the 



therma] resistance of the packages might have been re- 
quired. CPU designs are often upgraded to higher clocJc 
speeds after first release, so if package heat dissipation is 
nMTginal. upgrade capability is jeopardized- (TS^pically. 
power dissipation is proportional to operating fre<iuency. ] 
Tlie i>04i3in PGA had already Ijeen itsed to dissif^ate 25 
watts, which left an opixirnoiity for CTist-sa\ing modifica- 
dons. With the thermal ntargin in mind, two design changes 
were investigated, one to use a lowermost copper- Kovar- 
copper laminate heat spreader, and the other to eliminaie 
the heat spreader entirely. The first option was dismissed 
because of failures found during a low^-teniperature storage 
lest. (The laminate heat spreader detacheti from the ceramic 
body because of a disparity' in tiiential expansion mtes*) The 
second option w as also dismissed wlien the Ihennal resis- 
tance of the ceramic carrier was found to be too high. 

Tlie lime schedule for Uie conipletion of reliabiliij^ testing 
and manufacturing feasibilit>^ studies had to be considered 
when evaluating the tv^^o technologies. The PGA was a ma- 
ture tecimology widi considerable experience behind it, ajid 
the time schedule and results of the tesiirtg coidd be deter- 
mined \^ith some certainty, Tlie MQl AD was a new techttol- 
og^^ by contrast. The design w^as solid, but had several new^ 
features that w^ere untested in tentis of long-4erm reUabiiity. 
Despite the strong desire to exploit new teclmologj'^, the 
schedide risk was a signitlcattt factor 

By the time the ]Daitidonuig decision wus tt) be made, the 
PGA cost had shrunk to almost half of its original cost, the 
304-pin MQUAD was presenting schedule risks, and both 
MQUx\Ds had marginal power dissipation. Possibly most 
important, the PfiA provided a roi>usi solution witli theiTual 
and (dectrica! margins. The cost difference was still signili- 
canl, bul 1 be P(iA prt Aided a flexibility to the chip designers 
lliat offset its disadvantages. Thus, the PGA package was 
chosen for the PA TIOOLC. 

Memory Controller Destiny, Wtetlier or not t(» integrate the 
memory iim\ Ui } i'oiiti'<jllers onto the CPU die was one of the 
most direction-fomiing liecisions that we made. To decide 
conectiy, we had to consider the effects of integration on 
factors such as niidtiprocessor capal>ility. system complexity, 
memory and R) (^ontroLler design coniplexily die tnist. mem- 
oty system performajice, and rnenujry system nexit>iiit>^, 

TVaditional muldprocessor systems have a single main nieni- 
ory controller and I/O controller (see Figs. ;la and .Jb), These 
controllers tnaintain comiections to the multiple prot^essors* 
Systems organized in this way sepanite the memory and I/O 
conOollers frojn tlie CPI\ Tliis organisation allows users to 
Ui>grade ent jy-le\'el systenis to include multiple prot^essors 
at the exi)ense of reducing the niemory and I/(J performaitce 
of imiprocessor systems and adding significant complexity 
to both the memory and cache controllers. 

Our design goals focused on maximizitig uniprocessor per- 
formance, HP was already shitjpmg desktop multiprocessor 
systetns built around tlie PA 7 1 Otl microprocessor at the 
time we were making tJu'se decisions. The market segment 
that w^e were targeting for t!ie PA TIOOLC demanded peak 
unij)ro( esstjr perlVHinaiice ai a if>w system cost. Since our 
tnr^^et nunkel didrti require rnultiprrjcessingas a system 
(ipiion, W(* diret'li'd tuir idTniis inwairl the l>enents Lliat we 
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Fig, 3* (a),G») Multiprocessor m'Chitectures !n whicii the memor>^ 
and J/C) coBtrnJJer ai'P separate from the CPUs, (c) A uniprtice^sur 

syslein in wkiidi Uie meinoiy ^*Ji<^l I^*^ eontrEiilIeir are integrated intn 
the CPU chip. 

could biing to a system through a focused uniprocessor ilc- 
sign. 

hitegriitiiig the ineinory and I/O controller with the CPii hi a 
uniprocessor system (Fig. t3c) can have a di^uatic effect on 
reducing cache miss peiiaJlies by decreasmg the nuniber of 
clup boundaries that tl\e nussing data must cross mid by 
allowing die riienioiy and 1/0 controller em'ly access to im- 
jiortmit CPI ^ internal signals. Miss processing on the memory 
interface can cfiectiveiy bcj^ui in panillcl with misy detection 
ut the cache controlier. Aji irrtegraied nienioiy controller can 
even use teclmitiues such as speculative addiess issue to 
begin processing cache misses before the cache controller 
detects a cache miss. 

The reductions in CPI (cycles per instruction) diat we could 
achieve by mtegratuig Qie memoiy^ controller allowed us the 
degiees of freedom that we needed to explore certain cache 
arcliitectujcs m greater detail. Some of tliese architectmes 
are described in the next section. 



System coniijlexity is reduced with an integrated memory mid 
l/(j couli oiler, Tiie -i;i2-|>in CPGA thai we wcic considering 
for an integrated design had sufficient signal headroom to 
enable separate, dedicated memory and I/O connections. A 
two-chip approach, tisingthe luwer^-ost MQUAD packages, 
wouhi be forced lo sh^ue jilns between tlte nn^^moiy imd I/O 
connections to accommodate the low signal count of the 
MQLJAI) package, which would increase system cotnplexity. 

An integrated memory and I/O controller also simplifies the 
interface to the CPU. Since this interface connects two enti- 
ties on the same die, signal (X)tmt on the interface became 
much less imijoriant, vshich ah owed us to siniplity the 
interface desigri considerably 

On tlie down side, integrating a memory and TO controller 
reqmred enough tlexibility in its design to satisfy the broatl 
range of system ciLstomers that olu: chip wotdd encounter. 
However, this requirement also exists for a nonintegrated 
solution. HistoricraUy system partners have not redesigned 
memory contiollers that die CPU team hits pro\i ded as par! 
of a CPU chipset. IIP s advantage of pro\iding both proces- 
sors and systems has allowed us to work closely with sys- 
tem designers and enabled us to men dieir needs iji both 
uitegratcd atid nonintegrated chiiisets. 

In summary, integrating the nicmoiy and I/O controller onto 
die CPU core introductxi a gain iu perfoniumce, a retktctirHi 
hi complexity and schedule riskj and several possibilities lor 
reduced cost in the cache subsystem. These were the com- 
pelhng rctisotts to move the memory controller onto the 
CPU die m\i\ continue exploring cache alteniatives and opti- 
mising memory system perfomiance. 

Cache Organization. One of the distinguishing characteristics 
of IIP PA-RISC designs over lite past several iniplementa- 
tions has been the absence of on-chip caches in favor of 
large, external caches. W^iile competitors have dedicated 
large portions of tlieh sCicon die to oti-chlp RAMs. HP has 
ctjntuitied to utvest in aggressive circuit design teclmiques 
and luglier pin coinit packages thai allow their processors to 
use uidustry-stiuidard SR.''yVls, while fetching instmclions 
and data every cycle at processor fretinencies of 100 MHz 
and above. This has allowed our system partners to take a 
single jirocessor chip and design products meeting a wide 
range of price and peifonnance pouits for markets ranging 
from tlie low -cost desktop maclilnes to high-perfomiance 
serv'crs. For example, the PA 7100 chip htis been tised in 
systems with cache sizes ranguig from I28K bjies to 2M 
bytes and processor frequencies ranging fr<.)m 33 MHz to 100 
MHz, 

The main design goals for the PA 7100LC were low cost and 
Mgh perfomiance. UrLfortimately, high-ijerfonnance systems 
use large, ftist, expensive cacltes. Obviously, trade-offs had 
to be made. As with previous implementatinns, the design- 
ers started widi a clean slate and considered various cache 
options, including on-chip cache only, on-chip cache with an 
opUonal second-level cache, split instiiiction and data off- 
chip caches, mid combined off-c:hip caches (sec Fig. 4). Ulti- 
inately, the cache design was closely linked to the memory 
controller design because of the large effect of memory 
latency on cache miss penalties. 
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Fig. 4. DilTerent cacihc; orgaulzatiom. (a) On-chip cache (bj Qn- 
chip cache with aii optional second-level cache, (c) Split instniction 
and data off- chip caches, (d) Combined olf-cWp caches. 

On-chip caches ha\'e tlie ob\ioiis advantage thai I hey can 
aUow siiiglp-cyclc loads aiici stores at higher ehip rreciuencies 
than are possible with many off-chip cache designs. They 
also allow designers to biuld split iirid tLssoeiative cache 
aiTays whicli would be prohibitive R>r oft-chi]) designs be- 
cause of the kirge number oi' UO pins reijiiiret). Unfortu- 
nately, in cuiTent technologies uim^hip caches tend to be 
fairly small (8K bjles lo 32K bytes) atid even with twoto- 
four-way associativiiy they have higher miss rates than 
larger ((>4K bytes lu 25nK bytes) direci-niapped, off-^-hip 
caches. .\lso, on-chip caches require a snbstaiitial amount of 
chip aiea wliich translates to higher costs, especially for 
chips nsing leading-edge technology' with liigh defect, densi- 
ties. This exf ra t hip area also represents lost opportunity 
cost for other fealures thai ct^tild be included in tJiat area. 
Examples include an on-chip niemoiy and I/O controller, 
gmphics controUen more integer execution imits, multi- 
media special function tmlts. lugher-pcrfomiancc tloaling- 
point circuits, and so on. 

.^lother drawback of on-chip caches is their lack of scalabil- 
ity; provitUng uiidtiple caclie sizes requin^s fabricatuig multi- 
ple parts. To overcome this Imutaiion designers can allow 
for optional off-cldp caches. The off-chip caches can rmige 
In size and speed aiul can pr cjvide Ilexilnlity for system de- 
signers looking TO meet different price/performance choices. 
Low-end systems need not iturlude tjie off-ciiip caclie aiid 
can be built for a lower cost. High-end systems can get a 
tJerfonnajice bcjost by paying the extra cost to add a second- 
aiy off-chip cache. For most systems, the cost tor this flexi- 
bility ts added pin count to allow for connnunication with 



the off-chip caches. Other systems mi^t be ^le to multi- 
plex the cache lines onto some already existing buses such 
as iJie memor>' bus. 

For the R4 7100LC, we detemiined that a primary on-chip 
c^che would cost too much in terms of more expensive 

technologies, increased die size, and the lost opportunity^ of 
puOing nwre functionality on the cltip. liVithout a primary" 
on-chip cache, we were able to design a processor with two 
integer units, a full Boating-poiiu unit including a ili\ide and 
square ix>ot unit and a menioiy and I/O controller. We 
achieved this functionality using only 905.000 FETs in 0.8 
micrometer (CMOS26) technology on a die measuring L4 
cm by 1.4 cm (see Bg. 5j. CMOS26 is a mature HP process 
that has been used for several processor general loiis. As a 
result, it has a low defect density and thus, a low cost A 
processor with an oriK'hip cache would have required a 
more advanced tec^ntolog>^ ha\ing liigher wafer costs and 
defect densides. Of course, without an onni^hip cache^ we 
were challenged to cicsign a low-cost off-chip cache that 
allowed accesses at Lite processor frequency 

HP's previous implementations of PA-RISC were built with 
independent instruction and data caches made up of industry- 
siatuiai'd SRMls (see Fig. 4c). It would have been easy to 
leverage the independent ducct-mapped mst ruction and 
data cache design fiom die PA 7 Kit), but we were deter- 
mined to find a less expensive solution, Indepenflent cache 
banks requhe a high pin coinU on the |)rocess^:>r cliip be- 
cause each bmik requires ti4 data pins and about 24 pms for 
tag, flags, and parity. Thus, combining instructions and data 
into a single set of cache RAMs (Fig. 3d ) saves about 88 pins 
on the processor <"hip. These extra pins directly affect pack- 
aging costs. Also, fjrovlding split caches requiies using rtitiire 
SIL\M parts in a given technology'. Systems based on the FA 
7100LC with a combined cache require only 12 SRAM parts 
using x8* teclinoh>gy. By leveraging the aggressive I/O de- 
sign fi'oni previous im[ J le mentations, the PA 7100LC can 
access l2-tTs SRAJVl pints cver>^ cycle wlien operating at fi^e- 
quencies up to 66 MHz. Suice 8K x 8, 12-ns SKAMs me com- 
modities in today's market, the cost of a 64K-byte cache sub- 
system for a 00-MHz PA 7100LC is cornparaljle to tlie tjrice 
we would have paid for a mucii smaller on-chit> cache. 

Combined instniction and data caches have one large draw- 
back. Since the PA 7100LC processor can consume instnic- 
dons as fast as the cache can deliver them, there is iJttle or 
no cache batuiwidth left to satisfy load and store operations. 
To solve this i)rol)lem. we needed to implentetit some type of 
instruction buffer on the processor chip. A large instruction 
buffer would have all the drawbacks of the on-c!iip cache 
design discussed above, so we were determined to find a 
way to achieve tiie desired perfontiance with a smdl buffer. 
We knew we ^voukl need a mecfuinism to prefetch instnic- 
tions from the off-cdiip combined cache into the dedicated 
on-chip buffer dining idle cache cycles. Thus, we staited 
with a standard dirci t Jiiap[n>d 2K-byte buffer and simulated 
yariotis prefetch anrl [iijs> ;Ll:4i>nthms. As expected, we 
foimd thai t^crfontiance was extremely sensitive hi the 
buffer miss pendty^ which ranges from zero to two states 

' RAM si^as^rB quoted in depth b^ width fi.e , 64K xS ^5 85.&36 deep by 8 bits w\i^j. 
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1. ORAM Dat» i/0 Drivers 

2. Memory Addres^s and CoTitrol l/D Drivers 
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8. Floating-Point Registers 

9. Clock 

ID. General Syste^n Canned {GSCj Bus I/O Drivers 
•11. CPU Control 
*tl Cache Control 
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'O.FIoatins Point Control 

14. Floating-Point Instrnction Stack 
15 Unified Cache Data l/D Drivers 
1G. Cache Control Data Path 

17, Insimction Buffer 

15. Test Access Port ITest Circuits} 
19, Clock 

ED. Control Registers 

21. TLB {Translation Lookaside Gufferl 

22. Integer Data Path 

23. Unified Cache Tac^ U^ Drivers 

24. Cache Address I/O Drivers 



Fig. 5. A photDiTiicrofiraph cjl'Llie 
PA 710DLC CFV. The die mea- 
surt'is 1 .4 cm by 1.4 cm and toii- 
taUiB mnmO FEXii in ().8-n\icroai- 



for a two-word cache line, depmiding on bnmches aitd pre- 
fetches. We design e(i the prefetch niachine to use \irtiially 
every idle cacfie cycle iuu) tried to gel early acceijs lo the 
ol'f-chip c^ache on branch e*i. Sonic branches can be treated 
like loads or stores and be given access to the uff-chip cache 
even before they have access lo the on-chip cache. Uskig a 
sJuaU buffer vA\h a good prefetch algorithm we were able to 
greatly mitigate the penalty associated w ith having a single 
bus to the ofl'-chip cache. Of course, if cost had nol been 
such an iiui>ortanT factor, w^e still would liit\'e implejnetiLed 
sphl offnihip caches to get llie extra perfonnaiice iuid to 
rtBduce complexity. 

After settling on prefetch and miss algorithms, w& simulated 
varioiis birffer sizes, associativity and line sizes to detemiuie 
the optimal coRfiguratit>ns. We Touad that associativity' in- 
creased performance by less that^ 1% while mcreasmg area 



and complexity. We also foun<I that if we tlecreasetl tlie 
buffer from 2K byles to IK b^tes and userl the resulting ai^ea 
savings to inci ease the TLB (translalion lookaside btiffer) 
size from 48 to l>4 en t ties, we couUl gaiti aixnit 1% perfor- 
mance iinprovpuieni without any added coniplexity or area 
Thus, we chose a IK-byte dii-ect-irtapped instniction buffer 
and a f>4-ettTr\' Tl.B. We also sinndatetl t^^Uier buffer options 
|-e<iuiriiig less clup area, including a si^lit buffer flesigu with 
a I28'b>1e branch target buffer and a ^^jfi-hyte i^refelch 
buffer. We liad hoped that the prefetch butfer could keep up 
with the instniction demand for sequential code while the 
bi'anch target buffer supplied instruction targets for 
brandies, linfoitimately sue It a small branch target buffer 
could not hold most of the recently taken branch targets atul 
the performance was 2 to 5 percent less thim tlie IK-byte 
buffer. 
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Given the cost constraints, designing the off-chip cache was 
still faiiiy strai^tforward. As explained abo\ e. we wanted to 
ha\'e a single cache structure to hold both instructions and 
data. We had the choice of designing a unified cache that 
allows instructions and data to reside anywhere in die cache, 
or a logically split cache thai divides the structure into dis- 
tinct halves by using an addiess bit to distinguish between 
instructions and data. I'rafied caches have the ad\'antage 
that they dynamically allocate more or less of the cache to 
instructions or data as appropriate for the application. Tliis 
feature gives them slighdj' less than a 1% ad\^antage for most 
benchmaiks. Unfortunately, applic^ons hare a ^leater prob- 
ability^ of thrasliing a LUiirie<i cache by accessing the same 
cache mdex for both instructions and daia. Because of the 
potential thrashing problem ai\d to make control algorithms 
easier, we chose to implement the logically split cache. 

Besides the combined cache structure, another interesting 
difference between the PA 7100 and PA TIOOLC cache de- 
signs is the different DRAM configurations, Tlie PA TIOOLC 
is designed to access one double word (eight bytes) per 
DR.\M access to allow smaller systen^s to be built with only 
nine eight -bit- wide parts* whereas most PA 7100 systems 
access two double words per DRAJ^I access and require at 
least eighteen eight -bit- wide paits. Tlie invplicarion of diis is 
that PA 7100-based systems can buffer the DRAAI data and 
return a double word e\ery two cycles, which matches the 
tvvo-cycle \^Tite time required to copyin to the cache. PA 
7I00LC-based systems, on the other hand^ are DRAM-liniited 
and can return a double w ord only every three cycles. Thus, 
on a line copyin from memory, a PA 710Q will lock the cache 
for eight cycles (4 doutile wr>rds x 2 cyt les/douhle word). 
Had die FA 7iO0LC leveraged the PA 7100s control algo- 
rithms, it would have lockeci the cache For 12 cycles ( 4 
double words x 3 cycles/dot iblc word) for ever>^ cache miss. 
We found that by changing die com r-ol algorithms and open- 
ing one-cycle windows to the cache diirhig copy ins, we could 
allow loads, stores, misses, orprefett^hes to occur and we 
gained over 109^* in overall ])erformanc-e on most bench- 
marks. This hu-ge Increase indicates how seemingly small 
tiiiiiiges Ijeiween processors can have dramatit- effects. 

Performance 

Our decision to integrate the memory controller onto the 
CPU die required that wc caiefiilly ct>nsklf t ol her jierfor- 
mance fcatiu'es in that light. Features genei ally I ake up 
silicon area, and at our target 14 x 14 mm-, chip area was at 
a premium. We needed to ensiue that our design would con- 
tinue to fit into our target die size. We also needed to mini- 
mize the active area on tlie tlie to reduce the coot of the pro- 
eessor. 

These consideraUons lt;tl us t<j search for simple means to 
free aj'ea on tlie die tJiat had little impact on performance, 

Floating-Point Unit The floating-point performance of t he PA 
71(H) was so strong that we had the oirtion of trading some 
of it away to refhice cost for the PA 71h()hC. Alxjui 25 mm^ 
of the PA 7100 die ai'ea was devoted lo the floating-j)oint 
data path. PcrfVinnance simulations indicated that if we 
copied it unchanged into the PA 7100LC it would achieve a 
performance of at least 130 for SPECfp92 at 80 MIIz. 



We considered several schemes for seating liack the floating- 
point unit. One idea was to delete the divide and s(iuare-root 
block. Di\ides and square roots would be implemented in 
hardware by iterating through the multiplier with a Newton- 
Raphson^ or Goldschmidt ' algorithm. The performance loss 
for this change would be negligible, and we wcndd save 1.5 
nun^. However, it would iiave introduced a significanl amount 
of new complexity' to the multiplier and the area saved was 
not that great considering the excellent conapactness of the 
existing divide and square-root block. We decided thai the 
area sa% ed was not worth the schedule risk, so we kept the 
dixider. Complexity is very difficult to quanti^; but as a proj- 
ect moves through its dev elopment cycles, an earlier decision 
to simplify something is almost always remembered with a 
feeling of great relief. This decision was no exception. 

Another proposal for redticing area was to fold the multi- 
plier array. Multiplication on the PA 7100 is performed in 
four phases (two clock cycles). Tlie partial products ai-e 
summed during the middle two phases by a tree of dynamic 
carry-save adders (see Fig. 6a). If we used a smaller tree of 
carry-save adders, single-precision nmlliphes, with Iheir 
24-bit mantissas, could stiH be evaluated in <:)ne [lass, taut 
double-precision multiplies, with their 53-bit mantlssaSt 
would go through die tree twice (Fig, 6b), We found that 
folding the multiplier array for the PA 7100LC would save 
about 3 mni", In it increased the overall double-precision 
multiply latency from two cycles to three cycles. 

Low-level graphics software can be sensitive to floating- 
point latencies, so we consulted our partners in the graphics 
software lab. They detemdned that folding the multiplier 
array would be acceptable for* the HP 9000 Model 712 work- 
station because the relevant softwaie used mostly single- 
precision math. We simulated the effect of tlie iiigher la- 
tency on some of the SPECfp benchmarks. The geometric 
mean of the l>enclimaiks lost less than Wn perfonnance^ but 
the losses for indivifiuid herTtlimaiks viuiecl widely, from 1% 
to U3%. We had some conceni about the variiuice because 
large customers frequently use their own benchmarks^ some 
of wiiich are bound to be sensitive to double-precision niul- 
til)ly performance. But even 13% was judged to be ar^ ac<"ept' 
able trade-off for the aiea involved, so we decided to fold 
the multipher 

By tlie end of the project we found that 3 nrni'^ was not 
nearly as valuable as we first diought it would be. However, 
the tirea saved by folcUng tJie multiplier was removed from a 
critical cliit> dimension shared witli the newniemory and I/O 
controller, so the decisitm to fold the multiplier was solid. 

We sim]Dlified the floating-point controller by stalling the 
pipeline unconditionally during the execution of a divide, 
square root, or double-precision nuiltiply. On the PA 7100 a 
fHxide or square root contUtionally stalls the pipehne luitil a 
subsequent instniction tries to use its result. However, this 
conditional di\ide stall was a soLUce of bags late in the de- 
sign cycle of thai chip, so this simplification positively af- 
fected the PA 7100LC scheilnle, 

Tlie iierfoniiance loss for this cliange was estimated at 1% for 
divide and square root and 2^j for double multiply. The area 
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savings was small, but the savings in compiexjtj' persuaded 
us la make this change. 

The performance loss because of the floating-point changes 
turned out to be as small as we ex]jected. Floating-pomt 
peifoimance is often dominated by cache size and mcmoiir 
latency. PA TlOOLC-based systems tyi^ically have a smaller 
cache but faster memory than PA 7100-based systems. The 
final product achieved over 120 for SPECfp92 at 80 MHz, 
which was significantly higher tjian the competition and 
compares surprismgiy w^ell with the larger and faster PA 7100 
floating-poinl perfoniiance. 

Dual Issue, PA 7100LC-based sy stents needed to perform as 
well as mid range PA 7100-hased systems on integer code, 
hut with smaller' caches and a lower CPU frequency. Super- 
scalar"^ execution is a classic method of improving perfor- 
mance at a given frequency. The PA 7100 has superscalar 
execution so much of tlie control uifTastJoictnre was already 
in place for our needs. However the PA 7100 has only one 
integer and one floating-point execution umt, aBowing only 
lloating-point code to be accelerated. Perfonnance goals for 
the PA 7100LC were focused on integer applications^ so w^e 
investigated the possibility of adding a second integer 
execution unii IVn "integer dual issue. "* 

Our aggressive schedule allowed very little time to investi- 
gate the addition of a second integer execution unit. We 
identified three options for the classes of instructions we 
miglU be able to execute in parallel. For each option we esti- 
mated the cost hi engineering time, area, and possible iinpaet 
on our time to maiket. The benefits of each option w^ere 
predicted using simulation of benchmark instruction traces. 

Loads and stores typicaliy represent about 40% of all instruc- 
tions executed, so the first option w^as to split the existing 
integer execution unit into one that could do loads and 
stores and one that could do everything else. Tliis would 
enable us to execute a load or store in paraOel with some 
other type of instniclion. The second option added a full 
i\LLi in the load and store unit whieli would also allow two 
arithmetic or logical instructions to execute at a time. The 
third option added a specialized way to execute tw^o loads 
or tw^o stores that happen to be referring to adjacent mem- 
ory locations. 

The perfonnance of each option was not trivial to estimate. 
The benchmarks were compiled for cun^ent machines with 
one integer unit so the compiler made no attempt to sched- 
i^Ie instractjons In sucia a way that adjacent instruct iorrs had 
no data dependencies. This led to low-er i}eiforn^ance esti- 
mates t ban we would have exijected with an optimized com- 
piler. The performance lab addressed ttiis problem by reoi- 
dering the instructions within each trace before simulation. 
The reordering tool scheduled insi met ions to avoid func- 
tional imit contention imd data dependencies, usuig a range 
of assimiptions about our future compiler technology. 

PeifoiTiiance improvement was measured on sLx bench- 
marks from SPECint92 ^md TPCA. The fu^st option, load at\d 
store plus ALU operation, gained 1% to 7% perfonnance im- 
provement for the benchmarks using conservative compiler 



' CPU arctiitECtiKa thet allows the e^tecutiDti ol more thm one instritction in a single cloc^ 
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assumptions m\d 9% to 23% using optmiistic compiler as- 
sumptions. The second option, supporting iwo ALV opera- 
tions and the first option* gained another 1% to £>% perfor- 
mance inipro%^enient. The third option, additionally 
supporting two loads or stores, gained another 1% to 3% 
performance improvement. At the tinie, the performance 
gain for these last two options seemed discouragingly low. 

The cost of the first option %va^ estimated at about one engi- 
neering year of effort and 3 mm^ of area_ The second full 
ALU would add a few more months of effort and less than 
1 mm- of area. The double load and store option would ^dd 
a few more months of effort but no significant area. Perhaps 
the glycates! cost factor was schedule risk because of in- 
creased complexitj\ Functional bugs late in the design cycle 
can affect time to market, and sintilar fuiiciionaliiy Issues 
had been a source of btigs on other chips. However, experi- 
ence gained mth the superscalar PA 7100 design made us 
confident that adding integer dual issue would not limit our 
scfhedule. 

ntimately, concern about our compete lion led us to i imple- 
ment all thiee options. /\lso, wlule die performance improve- 
ment estimates on SFECinl02 might seem small some liigliJy 
tuned applications Cim derive enormoits benefit. One example 
is the softw^are MPEG video decoder described in the article 
on page 60. The HP 9000 Model 712 can display MPEG \ideo 
with stereo audio at full frame rate without special-pmpose 
hardware, aiul a significant part of this achievement comes 
from ti:ie PA 7100LC executing two integer ALL' instmctions 
at a time. 

Architectural Enhancements 

We addetl three new aichltectural features to die PA 7100LC 
implementation: little-endian addrcssmg, uncachable meiti- 
ory pages, and multimedia instructions. The first tw^o fea- 
tures aie present in several of today's microprocessors and 
rejjresent the evohition of motieni RISC architectures. Little- 
endian addressing allows foj' more efficient excctition of 
code compiled for oti^er platfomis and t*nables the use of 
new multivendor operatirig systems such as Wlndowtii NT 
UiK'achable memory pages incrctise the efficiency of code 
sharing cache hues between the processor and I/O and is a 
less expensive solution than implenwnting systems with 
coherent 1/0. 

Tlie multimedia feattires are more specific to the PA 7100Lf. 
In late li^U, HP created a muUidivisional team of hardwEiret 
software, and arcfiitectnre experts responsible for creating 
the technologies that wtjuld enable a low-cost workstation 
to be multimedia t-apable without the cost of dedicated 
multiniefiia h aid ware. At that time, many standards for 
video compressitm were emerging. Of titese, JPECJ (Joint 
Phfito^^raphic Experts (iroup) itnd MPEG (Mn\ing Picnures 
Exj^erts Group) looked most promising for still-frame and 
fuU-rnotion \adeo respectively. Since workstations ser\^c as 
decode-only clients in most en\ironments, the team decided 
to focus ort building an efficient dei'ompressiot; engine^ 
while leaving the more complt^x titsk of video compression 
to be done offline or by high-t^nd servers. 

Initial experiments with JPEG and MPECJ perfonumice were 
done using public domain software rumung on an IIP 9000 
Model 720 workstation. Even after extensive algnrilhm 



changes and softTsare enhancements, the performance was 
still far below the tdlimate goal of reaJ-time \ideo at 30 
frame&'s. One time-intensive component of the encode and 
decode algorithms is the discrete cosme transform (DCT). 
The DCT requires a large number of multiplies and adds, 
weighted dilTerentiy depending on the algorithm. Since PA- 
RISC* directly supports multiply instructions in the fioating- 
poinl miit but not in the integer uiut, we initially used fioaf- 
ing-point arithmetic for the DCT and found algorithms thai 
could take full adv«mtage of the multi-f operation FMPVAQD 
(floating-point nmltiply and add) instnicoon. 

WTiile the floating-point unit was efficient at pro^-lding a 
nmltiply and an add in a single cycle, it was inefBcient ai 
packing and unpacking data, normalizing results, and satu- 
rating results to maximum or minimum values* Thus, we 
found that a lot of time was spent con\-erting values between 
integer and floating-point representat ions to accomi>lish both 
the multiply-adds aiid the data rtiaiviiiuladons. To eliminate 
the convei^ions. we investigated the possibility of adding a 
midtipher to the integer data pa^h but found the area require- 
ments to be prohibitive for a low-latency. IG-bit or 32-bit 
ttniltipUer Gi\'en that JPEG and MPEG operate on 8-bit data, 
building an S-bit multipUer migiit have been feasible l>ui 
extra instructions for normEilizatlon of irt termed iate results 
would have been required. 

PA-RISC has always provided shift-and-add instructions as 
primitives for software emulation of integer mtdtiplication. 
These instnictions shift a register value left by one, two, or 
three bits and add the result to a second register value. 
Using these instmctions. nuiltimedia software can multiply a 
16-bit value by m\ B-bil constant with a sequence of one to 
tliree instructions. We found that by picking a DCT that was 
biased away from multiplicatiorus in favor of additions, the 
shift-and-add instructions provided good performance com- 
paied to tiie other oiUions mentioned above. The deciding 
factor, tiioughi was the airdity to atld parallelism to the sliifl- 
and-add instmctions along with the normal adds, 

As mentionetl above, JPEG atid MPEG operate on 84>it data 
and it is convenient to store intennediate results as l(>bit 
values. Tlius, it seemed reasonable to split the 32-bit data 
paths of the ALl^s to achieve I w^o l<>-bit operations per ALU 
per cycle. With a slight redesign of the integer AUJ. it was 
possible to break the cany chain, force cany-ins as neces- 
sary\ and allow for proper presh if ting of both 16-bit valuer 
packed in the 82-bil registers- We also c J tanged the pre- 
shifter to ^iliow^ the sIuft-Lmd-add ot>f rations to support divi- 
sion by allowing for right-shifts as well as left-shifts. Given 
the PA TlOOLC's dutil ALU desigi^, these luu^lware cluuiges 
allowed iis to acliieve four IG-bii adds, subtracts, or shifl- 
iuid-adds per cycle, Tliis brought ns closer to om- design go;il 
of 30 frames/s for video decode, but more work was needed. 
The next step was to add saturation logic to the ALU. When 
adding pixel or audio values, it is often desirable to "clip'' 
the result to The smallest or largest possible value as a result 
of underflow* or tiverHow, respectively. {Tliis is called aritiv 
metic saturation,) By Mjietifying a completer^- to the new 
16-bit inslructions, tlie hardwaie can be set to saturate the 



■ A completer is a part of ihi inauuctjon mnemonic specHying an optiaii, for e«ampte, in 
Idw.FTi the campteter m specifies atjdress modification for the ioad waid (Idw] mstructiwi, 
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result automatieally usiDg either signed or unsl^ied aillhine- 
ric- We ajso addeii mx instmclion to fjiJeulaie an average hy 
adding two registers and shifting the result right by f)ne l>it. 
Averaging is used in MPEG and other algoritlims i<( inlurpo- 
late between two values. 

Once again, these new features were irierernental ehanges to 
the integer ALU design ^ resulting in very htiie area overhead 
and no eritical speed paths, losing these new leatuiTS, an 
8()-Milz ['A 710UL(; can ac!iieve MPEG decompression rales 
ofSi) franies/s whh no sound using GIF Ct352 by 24(1) resoiu- 
tion. Witli full stereo soiukI. a rate of 25 franies/'s can i>e 
aciiievecL The PA TKIOLG is the first processor capable of 
achieving tliese rates without the addetl expense of tjefli- 
c-aled niultiniedia h aid ware. Tlie aiiiele on page (50 de- 
scribes tliese multimedia features in more detail 

Conclusion 

Corret 1 ly (feci ding which features should fmid should not) 
be hicluded ui a product is fundamental to the product s 
success. Design decisions are often strongly comiec'ted and 
often rtHjuire appropriately crafted supporting design nieth- 
f etiologies. Processor designers must make design detrisious 
in areas such as package technology, degree of integration, 
caelie orgaiiizatjon, numlier of execution imits, pipeluie or- 
ganization, and iloaling-]joinl fimclionality. Witii ihe 
I*A TlOOI^G processor, Hewlett-Packard lias demonstrated an 
ability to [Uake design decisions in amaiuier that leads to 
producls having a simng con\E3etitive position in the areas of 
cost, i^erfonnance, finality, and time to injuket. 
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Design Methodologies for the 
PA 7100LC Microprocessor 

Product features provided in the PA 7100LC are strongly connected to the 
methodologies developed to synthesize, place and route, simulate, verify, 
and test the processor chip. 

by Mick Bass, Terrj^ W. Blaii chard, D. Douglas Jusephson* Duncan Weir, and Daniel L, Halperin 



Engineers who wish to create a leading-edge product \\1th 
competitive perfomiaiice, features, cost, and time to market 
are often cliallenged to create (iesign methodologies that 
will enable ihem to suci!eed in their task. Decisions about 
the features of a produc-t usually have an iiTseparable impact 
on the metliotiologies used to create, veriiT^f, debug, aiui test 
the product, 

Dunng the development of the PA TIOOLC microprfx-essor,^*- 
enguieers crafted several metbodotogies lluil supported tlie 
design derisions that were made rhrouglunit the project 
and pro\'ided the framework for unpiementing the design 
decisions. 

Tliis article explores several of these methodologies. For 
each methodology, we discuss the design decisions that im- 
pacted tlw nit*thodolog>, the alteiiuttives thai we t^onstderedi 
and the course that we chose. We discuss the results prf>- 
duced by each mediodologj; as well as problenis that we 
encountered and overcame cturing each metJiodoiogy*s de- 
velopment and use. 

Sonie of the design def^isioiis liiai monvatcd us to rlevclo]) 
new design methodologies for the l*A TlOOLt" ^ire discussed 
in the aitirle on i>age 12. The iueas in which we developed 
ihese methodologies inclucle contrt>l synthesis, jiiiacc antl 
route, producdon test, process* >r diagnosal>i]ity, presdicon 
verification, and posts! Ucon verification. 

1*bc resultjint methodologies were crucitU to oiu ability to 
mecl I he design goals that we had set for the PA TIOOLC- 
Taken together, dicy en allied gf jod decisions leading to a 
successftil product implementation. 

Synthesis and Routing Methodology 

Tlie control circuits in any mit*!'0|)r<K'essor typically represent 
a m^or portion of the complexity of the chip. The control 
circuits of the chip contain most of the chip's iiUelligence. It 
is tliese circaiits that direct the rest of the comprjnents on the 
chip. The operation of the control rircuitsS is similar to the 
w^ay operators of complex maciiines on a fact<:>i^" tloor con- 
trol the way that those machines behave. 

Blocks of control circuitry perform similar Jobs, and ihe 
nature of these jobs determines the nature of the control 
blocks themselves, (onlrol bkirks l>i>ically implenuMit logic 
equations, the out tints of wbich t nritrol ^orne other funrtion 
present ort iJie chip The logie equations implemented by 
controj blocfks tend to be irregular and loosely structured. A 



necessary characteristic of any control block is for its out- 
puts to become valid in sufficient time to control its dowTi- 
stream circuits properly Like other portions of the ctiip. 
control blocks can have timing paths that limit the ovcndl 
chit> oi>eraiing frequency if the blocks lue not careftdly de- 
signed and implemented. 

Another characteristic of blocks that implement control 
logic is that they change frequently throughout the design 
process, tixperience has shown that a vast tn^ority of bugs 
are found in the control blocks, probably because scnuuch 
of the <'hip cojuplexjty resides there. We have found that it is 
very likely that die last bugs fixed before a chip design is 
sent to ntanuf acturing w^ill be m these blocks. 

Wlxen we were defining tlte methodology for imidementing 
tJie (*ontrol circuit i> h>r the PA TIOOLC, we considered these 
general chai'aci eristics, as well as specillc new re«|iiirements 
that stemmed from our design goals for the project. The PA 
7100LC* had new requirements, compared to earlier CPUs, m 
tlie areas of low pow-er dissipation aiul supj>or1 of Ipix^ test- 
ing. We knew that the PA TIOOLC control wonld be even 
more complex than past CPUs because of its high level of 
imegration and its supei^calar design. T€> nuike it easy to 
accommodate tliis new limctionality, wp wanterl to lie ajjle to 
make the control blocks as small mul as nexibly shaped as 
possible. I'mally, since we were Icveriiging the rtesign of the 
PA TIOOLC processor from the PA 7100 processor;^^^ we 
wiuvted to leverage control eqttations or control circuitry 
from I he past design for rnimy of the blot ks. 

The control of die PA 7100, from w^hich w^e were leveraging, 
is primarily implemented as a programmable logic array 
(PLA). Progranunahle lf.>gic aiTays have very regular physi- 
cal and timing characteristicrs. The PLA architecture used in 
tlie PA 7100 involves dynamically t>recharged and pseiifitv 
NMOS circuits. Tlie outtaits of this Pl^\ i>ecome true at least 
one VPV st'div after- its inputs beeame valid. Tiie PIj\ latches 
all ijitmis with respect to a specific fixed clock e<lge. 

PLA Methodaliigy. The methodolo^ used to design PLAs for 
the PA 7100 was well developed as were die tools tl^at were 
necessary to support it. PlJks w^ere designed in a high-level 
language with a syntax rc^miniscent of the Pascal prograjn- 
niing language. In-house tools were available to translate the 
high-level somce language to optunized Boolean sum-of- 
I>nHitJcLs equatioas. Other uvhouse tools were available to 
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use these simvof-products equation to generate the PL'\ 
artwork (including prograninimg tJie array). 

Wlien Llie destination oiicuits coLdd not tolerate the one-state 
delay required by the PL^ core^ we created seliematics for 
handcrafted standajd-rell blocks that could ralrulate their 
outputs in the required tin;e. We Uien tiseti ati in-hoiLse chan- 
nel router to create artwork for the standard-cell blocks. 

Tlie PA 7100 PLA methodology had several advantages. The 
PLA design ajid implementation tof)Ls were simple and well- 
Lmderstood. They [uxnided a turnkey artwork generation 
solution from the luglt-levei lonlrol eciuaiions, which made 
it easy to accommodate late changes. MosI impfuiimt, we 
already had a high mvestment in this metho<iology. We un- 
derstood it very weU, had all the required io<:)is in place, aiul 
knew we wouldn't fmd any siuprLses. 

However, when considered in hght of the requirements of 
the PA TlODLcr, the PLA methodology had several disadvan- 
tages. Alihougli the physical structure of a PLA is fixed <nid 
very regular, its fixed shape would lead to difficulty in floor 
planning for a chip as iiigWy integrated ss the PA 7KK)LC. We 
also knew thai PLA unplementations of control logic do not 
yield optimal circuits mth respect to al^solute size. Pl^ cir- 
cuits involve l)oth precharged logic and jiseudo-NMOS logic, 
leading to high power dissipation relative to fully static cir- 
cuits. PLA circuits are also incompatible wirh our Ip^g test 
methodology, which is described later in this aiticle. Al- 
though PLAs can usually guarantee a one-state delay from 
input to output, then- timing is inflexible. Tlie addition of 
hand-designed stiuidaifbcell blocks to address this problem 
is not only labor-intensive, Init also adds comijlcxity to the 
overall solution and mcreases the prot>aljility of uitro- 
ducmg bugs m diese areas. Also, some types of control logic 
camiot be represented compactly in the sum-of-products 
form required by the PLA methodology. This logic must then 
either be moved into a standard-cell block or redesigned . 

New Methodology- Since the tlisad vantages of the PLA meth- 
odology^ would compromise our ability to aeliieve our design 
goals, we began to investigate altenial ives. We tiad some 
positive exi-ierience with using S>iiopsys. a commercial syn- 
thesis loolt on the floating-point control block of the PA 
7100. We began to hivestigate the jiotential impact of com- 
bining automated synthesis using Synopsys with an over-ihe- 
cell router. t Our investigiition of combining tlTe synthesii^e 
and route methodology pointed o\n the followmg advati- 
tages and disadvantages: 

• Tlie absolute she of the blocks produced would he smaller 
than the blo<*ks produced using either PLAs or charm eh 
routed blocks. Additionally, the floor plan would he more 
flexible than that produced liy a PLA, allowing us to parti- 
tion the controller so tliat we couid create control blocics 
that flt into available area close to the circuits tliey must 
control. 

• W'e would have to pay more attention to timing because we 
would no longer have the regular timing structin'e of the 
PLA to guarantee tjiat stale iHidgets would be satisfied. 

• The <:ircuits produced would dissipate less power than 
corresponding PLA imj:>lementations because the svnUieslze 
and route nietliodolog>^ uses fully static circuitr^^. The circuits 
would also be Iddq compatible, 

t Over"LheH:a!J muteis pfac^ and route tells so Uiai ihers is tess need in pruvide routing 
channels between th& calls. 



• We would liave to design a new library of staiidard cells diat 
would be f onipatible with the over-l he-cell router We would 
also nec^d to design a new^ set of drivers ihat wonld drive 
output signiils from the statidiird-cell core to the rest of the 
chip and that would be compatible with our production test 
design niles. These tasks weie very well-detined and we 
understood the effort that would he reqtnred to c:omplete 
tiieni. 

■ Of greater concern was the realization that the synthesis 
path from the input equations to completed artwork would 
tie more complex than the corresponding paUi in die PLA 
niethodology and would be almost completely new. 

With tlie PLA methodology, we knew \\n\\ ifiere would be no 
surprises. Incorjiorating this new technology would remove 
much of that certainty Howeverj the benefits clearly out- 
weighed the costs. We felt tliat we coiikhrt afford to conipro 
niise our power, area, tiitiing, and test goals by continuing 
with the PLA nieihodologj; 

We overcaane several issues while making the uew method- 
ology' work for us. We leveraged die source code of many of 
the control blocks from tiie PA 7100, all of which were spe- 
cified in the PLA somce language. We were able to leverage 
existing PLA sourcres directly by usmg the PLA tools to gen- 
t?rate sum-of-product equations in a forai thai the Synopsys 
synthesis tool could understand. SjTLopsys was then hee to 
massage the equations into a more optimal fonn. Source 
code development of these leveraged control blocks con tin- 
tied using I he l^L^ source language, even though we were 
using tlie new methodology for syntliesis anci route. We de- 
veloped control blocks that were new for the PA 7100LC 
rising the Verilog behavioral description language, which has 
a more direct input path to Synopsys. 

We chose the Cc03 router from Cadence Systems Inc. to 
perionii the plac^e and route portion of our new metliodol- 
ogy- The main issue remaining was how to integrate this 
new tool with our other tools. To minimize the number of 
costly licenses we needed J to purchase and to mtodniize the 
block rlesigners' producrivity, we decided to use our existing 
ai1v\Tjrk editor as a front end to the routers floor planning 
capabdit>^. Tins approach allowed designers to preplace crit- 
ical cells, power nets, and clock jiets easily. W'e developed 
new tools that would translate this tloor plan itu<;) a form 
that the C'elUJ router could understand. While these tech- 
niques mj:Otimized designer prf>ducti\ity ^uid minimiiced li- 
cense cost, we found diat it was sometimes difficult to Lso- 
late bugs in the methodology to either om' front-end tools or 
to the CelJ*J router itself. 

We also discovered that the timing capabilities of the ver- 
sion of Synopsys tJiat we used were less robust than we had 
believed at tlie beginning of die project. Tills disco verj' had 
only a minimal impact on blocks diat were leveiaged from 
PLAs because of the regularit>' in the timing of those blocks. 
However, to eiismc robust timing on the remaining blocics, 
we needed to develop new tools. The need for these unan- 
ticipated workaroimd tools had a negative impact on our 
schedule. 

As with PLAs. we also found that certain t\pes of circuits do 
not map well to tlie synthesize, place, and route metliodology. 
On a large block where we made much use of the timkig 
flexibility offered by static standard cells, we found that our 



2 4 April 1995 Hewlett-Packartl Joiinial 



)Copr. 1949-1998 Hewlett-Packard Co. 



PA 71001X Processor 



llefiiofvaB^ 
l/QDiatiiil 



Roaiiog-Poim 
Coatnil 



Instruciiofl 



Genaral Sfstem 

CfMtnectfGSC) 

Em 



DRAMs^ 




Jastruction 



Data 



AdclrftSQ 



External 

Cache 

Inteiface 



L 



SRAMs 



Control Lines 



levell 
Instruction 

Cache 



^ 



Translatiofi 
Loakaside 
BwfferrriB) 



Boating 'PDint 

Exeeotion 

Unit 



I InstructJon 



^ Address | 

I 



f-- 



Data 



«-H 




Instmction 

Execittion and 

Sequencjiig 

CofrtrDl 
14 Blocks) 



Inte^ier 

Execmion 
Unitl 



Integer 

ExecLflJan 

Unit 2 




Fig, L A sinipiifief! block diagram 
-jf tlif PA TIOOLC showing the 
relaiioiiship between the control 
blot'ks atid tti** otht^^r nlaj^;Jr blocks 
ill the processor. The iiistt\iction 
exf cution and pipeline ,seqiienc- 
iiig rontrol block c:onsists of (qui 
separate blocks that are physi- 
cally rlistiiict bur highly i titers :ijn- 
jieetJrd- Not all of the control con- 
tiei; LitHi:4 on the PA 7100LC are 
stiowii in this figure. 



synthesis toob were sometimes unable to produce circuits 
that met ihe tiniinp^ and are;j eonstniinls of the bh>ck. Wlien- 
ever ibis ocriim d, we had to redesij*n the control sr)urceso 
liia( file synthesized circuits could meet theif physical re- 
quirenienls, or lit^lp the tools by hand-designing portions of 
the circuit. 

We foimd tliat on sonip of the staiid;ii'd<ell fjlocks levt^raged 
from the PA 7100. the syinhesis t(K»ls hail tliffictilty creathtg 
circuits thai perfoniipd as well as iheir F*A 7KI0 cfumler- 
patis. Tills diflkulty was caused in paii by tlifferences in the 
standiird-ctell libraries for the two chips. The PA 710OLC U- 
bnMy had no pseudo-NMOS circiuts. which were used quite 
effectively to meet timing on the PA 7it)0 fat the expeiLse of 
higluT jKiw^er dissiijafion), The rest of the difference lies in 
the fat;! tliai, for all its sophistication, atitomaied s>Titliesis 
is still no match for carefully haiid-designert hloclcs. Fortti- 
nateiy, otir design tools alb) wed iis to hajid-design jmrtiojis 
of tiie block Avhile synthesiziug the rest of the hhjck. Al- 
thoiigli time-consuming, we chose Oiis approach in cases 
where the tool path w^as tmahle to pro\ide a satisfactory 
solution. 

The overall results of the iriethodology we chose were gooti. 
We were able to parlitjon the PA TlOtlLCs cnnfrol fmul ton- 
ality into seven prinuuy ctinirol blocks. Four (jf the blacks 
control the sequencing and execution of tnstrtictions by the 
pipeline. The reniiunitrg three contrt)! blocks ctmtrol the 
mt^mory^ imd I/f > sulisysletn. the ca<'he subsystem, and the 
floatittg-pt Mnt coprocessor (see Fig, I]- Together, these 
seven blocks lepresenl ojily iri%()rihe total tlie mea. mid 



implement nearly all of the control algorithms and protocols 
tisedbythePA710€LC. 

Even though the PA 7100LC adds integer superscalar execu- 
tion ^nid a memoiy anri I/O controller com|)ared to the PA 
71(10, tiie area of the cojiirol core tirndnced by the new 
methodology is about half the area of the PIA core of the PA 
7100. Tite area ocei.ipied by the dtiver stacks in the control 
blocks on die two chips is alxjui Lite s^ime. 

The ncAv mc^thodology implemented all of the control blocks 
coned ly anrl introduced no fittictional bugs. The tinung 
niethodolog> that we had itr i)Iat e In (he end of the jjroject 
was very effecti%^e at identiJyinu i u ■ li^hin timing paths be- 
fon^ they tnade it tynU) silicon. V\ lim we received ehit>s from 
matmfac luring, we fotmd no proljlem timing paths in any of 
1 he control blocks that were created using the new method- 
ology. 

\^rification Methodology 

Orie < if the most j>ronun**nt design goals for tJie PA 7100Lr 
was to meet Uie schedtjie require^l to enable a very steep 
production ranip. This goal cotipled with Hewiett-Packards 
eonunitmeni to quality, niemit that we needetl to have in 
I>la<*e a scjliil j4an ti> verify Lite correctness of the chip at all 
stages of its design* 

Oitr design goals and the knowledge that the PA 71000LC 
was to be tlie most highly integral t*d CPU that HP had evtn- 
createfl led us to Ifjcus early on the methodr>h>gy tjiat we 
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woukl ust^ to verify the chip. As hihowii in Fig. 2, our verifi- 
cation methodology included several distinct forms of veri- 
fication, some of wliich occur before si Neon is manufac- 
tured tpresilicon verification) and some of wliich occur 
alter first silicon appears (postsilicon veriftcationj, 

Presiii c on vcrificati on activit i es i J n: 1 u d e d : 

• Creating softwaie beliavioral mode is Uirougli wliicli wc 
could verify the correctness of eitfier the entire design or 
portions of it 

• Creathig switeh-level models of the implemenl;ation to 
ensure that tlie Iiriijlementation matched the design 

• Writmg test cases that provided thorough functional 
coverage for each of these models 

• I'sing in-circiiit emulation to increase vector throughput and 
to provide an orthogonal ciietk of the cliip's correctness. 

PostsiJicon verification acti\ities mcludedi 

• Augmentuig fuuctif>nal coverage by rumiing hiind-genei-ated 
test crises, randomized test cases, and apph cation softwaie 

• Testing actual sih eon against its electrical specification 
using a rigorous electrical testing procedure* 

We designed each poition of our verification methotlolog^v to 
ensure that we coulti meet our schedule and quality grj^ily. 
The following sectir jtYs ciescribe in more detail the types of 
verifieation we useci. 

A New Strategy 

At the time w ork Wiis starting on the developmenr of the 
PA 7100LC chip, HP was moving toward a new product 
development philosophy, which had as its basis the fact that 
HP could no longer tifford to do ever%tlung for itself. Tlie tinie 
had come ttJ si>ecialize in core competencies and look to 
outside vendot^ to cover the needs common in the itidustiy. 
Unless HP provided a clear competitix-e advantage over in- 
dustiy^-standard tools and methods, design teams were en- 
comaged to adopt these standaids, paying others to develop 
and maintain leading-edge tools and processes. 



fj tiring tiie PA TlOOLC mvestigadon phase, engineers bivesti- 
gated indusliy-standard tools in the aicas of beha\ioral simu- 
lation, static- timuig anaiysisn fault grading, timitig verification, 
switch-levei simulaiion, aiKl other areas of chip verification. 
The first and foremost goal of these investigations was to 
determine which tools provided the fastest kind most effi- 
cient contribution rowajci design and verification, iJtimately 
leatUng to earlier products. The following section will pro- 
vide an an^Uysis of our behavioral simulator selection, which 
is Just one example of the many tool decisions we made for 
the PA 7100LC\ 

Behavioral Simulation. Before the PA TlOOLC development 
effortt we had been using a proprietary simulator which was 
written and maintained by an internal tools group. Wit h the 
standardization of simulation languages ii^ the indnstiy, wc 
questioned the value of high internal development and main- 
tenance costs for this tool. We uivestigated the language and 
simulator options available in the mdustr^^ and eventually 
reached a final list of choices; 

• The proprietaiy HP solution 

• Verilog 

• VHDL (IEEE st^aiidard 1076). 

Other HP design labs, responsible for giaphies and IC hard- 
ware design, had migrated l<:i V'erilog from the IIP siumlator 
and had fomid significant improvements in sinmlation 
duougliput on their ASIC desigiLs. The thr<:)uglipu! disadvan- 
tage of the IIP simulator was somew^hat balanced by the fact 
that it carried no licensmg fees, was fully robust t ajid had 
been proven capable of siniulatiug a large, custom IC design 
such as a CPU* 

Verilog had become a de facto st^Jidarci in the L'.S. for high- 
level and gate-level simulatioji in 1992 and had i>eeu used 
extensively in HP's graphics hardware and 10 design labs. 
Tlieir experience indicated that Verilog was veiy robust and 
tiiat it allowed personalized extensions through luiking vv ith 
C code. The IC design lab demonstrated simulation speeds 
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with Verilt^ that were about seren times faster than the 

internal HP simulator. Since Verilog was becoming more 
common wiiliin IIP, it would ease our task of sharing and 
combining simulation models with design partners. For ex- 
ample* the floating-point circuits that we w^ouid be le\^erag' 
ing from the PA 7100 for the PA 7100LC were modeled in 
V'erilog. The gnii)hi<:^ chip and die LASl chip used in the 
Model 712 warkstadon were being developed using ^'e^ilog, 
and many of the commercial ICs used in the system had 
Verilog models available for system simulation. By choosing 
\erilog, we would create a homogeneous environment. We 
also felt that \ erilog's C4ike syntax would allow engineers 
to leani the langtiage quickly. Finally, the X'eHlog language 
would provide a bii<ige to other useful indiLstr>-staiMiard 
tools for static timing, fault grading, and syniliesis. 

At the time we were investigating simulators we founil onl>^ 
one suppher who could provide a mature Verilog simulator 
ill our required time Fnime. This i>articular simulator had 
some disadvanuiges co[iipaie<l to onr intenial simulator 
which included liigher main memory requirements and the 
need to recompile the simulation model at each im ocation 
of the simulator. For large models, tiiis compde piiase could 
last i\ full minute. The internal simulator, by c^ontrast, com- 
[iiied the model once mto an executable program which con- 
tained the simulation engine, and incurred no run-time 
startup penalty. Also, because Verilog was licensed we 
would have to purchase sufficient licenses to cover our sim- 
ulation needs, which would present a large initial expense. 

A tliird m^jor simulation iaiiguage we uwestigated was \T1DL 
(IEEE Standard 1070). Wiiie Verilog was becoming a dc facto 
standard in the United States, VIIDL was sweet>ing Em ope. 
VHDLshare(i tnany advantages and disiidvantages with 
Veriiog. Simulalion models of conniieriiiil system chips 
were often available hi both langtiages. VI IDL |>rovided 
hooks 10 support: m(histry-standanl tools for liming, fauh 
grading, synlht^sis, an<l hartlware acceleration. VllDL was 
iilsd McerLHed and would be expeiLsiv(\ 'Vhv pritnai"> differen- 
liator between VHIjL and Vchlog was in ease of use iiitcl 
ease of learning. Ottier HP design labs indicated that VHDL 
was more difrniilt to leant and use thmi Venlog. Also, there 
was no local extK'rtise in VMDL, while [injOciency in Verilog 
bad i)eet^ growingt ;uh1 stgiiinctiot hir<*atls had already been 
tnade at inlegraHng \erilog jttto the remaintler of our tool 
set 

With tins information m mind, the PA 7I00LC technicaJ learn 
decidetl to use Verilog as Oie mfjdeling language for the PA 
7inOLC processor The compelhug motivations foi^ this 
( boice were: 

• The flenu.>nstratetl success of other HP labs ui using the 
Verilog simulator in ASIC designs 

• The availability of local expertise and support for the 
simttlator imtl ntodeling limguage 

■ The al)ility to stmidaivlize on a single simulator Baxii tnodel- 
ing langtiage Rn^ tlie <ievelo[rment of all custom VLSI used in 
the inMJDdO Model 712 

• The ability to interfiice easily to other tndtjstry-standard 
tools. 



Gh^en this decision, we joined an e^ort with other design 
labs 10 enhance the Verilog simulator lo include an im- 
proved user interface and more tool interfaces to be used 
throughout our verification effort- 

Tuffi*on Process. We migratetl to the Verilog modeling lan- 
guage and sinutlator in two steps. First, we vahdated that 
\^erilog (*ouJd simulate an existing PA-RISC design of compa- 
ral>ie complexity to the PA 7100LC by converrine the PA 
7 1 DO simulation model (from w-hich the PA TUMILC tlesign is 
leveraged! mto Verilog. Second, we used the knowledge that 
we gained during this conv ersion process to complete the 
development of tbe PA 7100LC, 

Converting the PA 7100 stniuiation model into Verilog w^as a 
good decision for several reasons. We wanted to start with a 
knowTi ftmctional model from which we couJd leverage. We 
also needed to confirm that Verilog was robust and accurate 
enough to mociel a design as large and complex as a CPL\ 
The PA 7100 offered a hieriirchical, semieustom design 
model that consisted of high-level behasioral blocks (e.g., 
the tninslat.ion lookaside buffer) aiid PET tiescriptions (e.g., 
in custom leaf cells). This varied design would provide a 
goofi test of the simulators ability and would help us to 
leani about Verilog's uniqtte requirements. 

To aid the conversion process, we created a tool that con- 
vened the I IP proprietary' modeling language to Verilog syn- 
tax. We fixed code by hand w herever tlie tw^o languages did 
not have similar coastructs or where tiiey evaluated similar 
constructs differently. The converted model passed its finit 
test case within two mondis. 

Once the PA 7100 model was u|> at^d naming in Verilog, we 
mcasure<l itssinuilation througliput. Instead of ttie expected 
7x speedup, we disc( jvercfl a full 4x slowdowTi compared to 
l!ie HP siniulaton We also found tltat the model consumed 
more memory than we had anticipated. Tlu'ough carefid 
anaJysls atul suptjort from our supplier, we learned that 
mtuh of our niodel syntax was very inefficient. In addition 
to inetTiciencies created by the translation tools, inmiy syn- 
tax structures that were optimum in HP's simulator were 
iionoptimal in VeiiJog. Profiling ^md correcting these ittetTt- 
cieniies greatly improved perfonuiuice and resource re- 
el uirements, 

Itesults. Tlie result of the decision to use Verilog to model 
the PATIOOLC was t>ositive, \^ith a few disappointments. 
The iTiaiii disatJpointnient was diat the Verilog model of the 
PA 710bl4C achie\'efl only parity in tlurouglipttt and required 
five !t!nes more mernorv" titan die HP simidaton 

However, Verilog brought strengths in other areas, Verilog 
allowed us to make iitcremental changes to the model 
quickly and €*asily. Verilog enabled us ttj capitalize on Indus- 
trj-^-standard tools in die areas of synthesis^ timing, fauh 
grading, and in-crircuit emulation. We were able to use a 
single mo<lcling hmguage across all nf the nistotn compo- 
ncjUs in the IIP 9000 Model 712 works! at ion ami to ol>taiu 
compatible models for many of tiie external cornpotients. 
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We soon learned to use tJie new sd'engtJis provided by Verilog 
and became efficient in usin^ Ihv langiiage anfi the new sini- 
ulaton Verilog surcessftjlly rimdeled all cunslnicLs reqiiired 
in the F'A 710f)LC' design, and a high level of quality was the 
end j^esLilt of using tiiis toot 

Presilicon Functional Verification 

Because die cost and lead time ai' niaiiufacturii^g CPl^ die are 
so great and because our system pail ners depend on fully 
fimctional first silicon to meet their schedule goals, it is im- 
portant that our presilicon vcrificatioji methodology give us 
high confidence in I he functional quaiily of the first silicon. 
This task proved to he a challenge for the F^A ?1()0LC" chip 
because it was designeri Ijy many engineers, atid its feature 
set is extensive and complex. Tliese factors introdiicerl the 
oppoitmiity for design and implementation hugs. 

Tlie PA 7! OOLC is the first HP processor chip \o integrate the 
memor>' and I/O controller on the same die as tlie CPU. tn tlie 
past, diese designs lived on separate die and were owned by 
separate project leants. The verification efforts for the two 
designs were mostly independenL A caretbl specification of 
the interface between the two designs allowed ttiis approach 
to succeetl 

We realized that even though the PA 71 OOLC would integrate 
the memoiy and VO controller onto the C'Fl^ die, it would be 
more effective to verify the memory and I/O controller sepa- 
rately from the CPU core for the m^ority of the tests. This 
would idlow test cases for both the CPC iind the memory and 
I/O controller to he more foctisefi, smaller, arni faster !o sirj^ 
ulate tJian they would l>e In a combined modef We created a 
iveh-defined mterface between the CPU and memozy and 
I/O controller to enable this approach. 

Each of these t^resilicon verification effons was stmctured 
as shown in Fig. 2. First we created a behavioraJ niotiel lor 
die portion of the design whose function was to be verified, 
A beha\ioral model represents the design at some level of 
abstraction,' and tyjjicjdly moves from very high-level to 
much more specific as the project progresses. As mentioned 
aijo\e, we chose Verilog as llie modeling language Tot- our- 
design. 

The behavioral model was the heart of the simulation cnvi- 
rotuueru that \^'f>uld enable tis to verify the CPU and the 
memory and I/O controller. Our job was to find deficiencies 
in this model. However, to do this we rteeded a way l:o stim- 
ulate the model, observe its results, and ensure that lis be- 
havior was correct. To meet these needs, we created addi- 
tional software objects to complete the simulation 
envii'oninent. 

At each of the external mterfaces of tJte behavioral niodelj 
we created custom code that was capable of mo<!eling the 
behavior of the device on the other side of the iiuerface and 
o r s t i I u u lat i n g ai i < 1 resp o 1 1 d i ng I o t he in terf ac e as app rop riate 
for that device. For exajni>te, these stimulus-generating soft- 
ware objects were used in our simulation environment m the 
same way diat dynamic RAM, external cache, imd UO devices 
aie used in a j^hysical system. We autiiored the code that 
models these objects in a iiigh-level language (typicaLly C). 



/\nother tyi^e of custom software that au^nents the shnula- 
tion environment consists of checkers. A checker monitors 
the bciiavioral model and checks aspects of UKjiiel behavior 
for concctness. We used a numher of different checkers 
during the l^A 7 1 OOLC verification effort* Some cluK.kers 
w^ere very focused (e,g., a protocol checker on the I/O bus)^ 
and oUier's were more global fc.g.. the PA-RISC arcliitecUiral 
simulator). 

Creating ''watchdog" pieces of code to detect and signal 
errors automatically in the simulation environment helped 
us to mauitain our schedirle. Previous CPl ^s had an indepen- 
dent model of the design that matched the behavioral model 
state-by-state for all external pads and aiThitected internal 
state."^ Creathig the independent model was time-consuming 
and not easily broken into small pieces tiiat could be 
worked on in parallel We couldn't nui test cases on the be- 
liaviorai niCKlel wiOmut a fidly funrlional independent 
model Rei J lacing this mdependent model with a collection 
of checkers allowed us to create multiple checker at the 
same time. We were able to turn on the checkers mdepen- 
dent ly as the functionality that they checked became avail- 
able in the behavioral model Also, tiie checkers dichi't need 
to be fully functional for us to nm useful test cases. 

Tlie final ^ispect of the simulation environment is the test 
case. A test case jjrov ities uiitiahzation to the model and the 
St inndns generating software objects and then orchestrates 
the firimulus generators to provide external stimulus while 
die model is simulating. Tlie checkers constantly watch 
model behavior and identify niles that, the motlel violates. 
The test cases aie not self-checking. They simply stimulate 
the model and rely on the checkers to ensure that the model 
responds correctly. 

We wanted the test cases to create the coinpiex interactions 
m the CPIi^ corp and m the memoiy and I/O cor\E toller that 
are necessai^^ to fhitl suhtle bugs. The model stunt rhrs gen- 
eratorSj and checkers provide mi environment that makes it 
easy to ger^eraie short, powerful test cases. To improve lest 
case coverage, we gave the responsibility^ for test case cre- 
arion to l>oth the CPC arxt the memoiy and PC controller 
designers, who had a tletallefl kiuiwledge of I he intenial 
operation of the chip, as well as to independent verification 
engineers, who ki^ew only the external functional specifica- 
tion of the chip. We used ciesign reviews to ensure that our 
suite of test ciises adequately covered all functionality pres- 
ent in the design. 

Testing on the behavioral model is the tu'st line of defense 
against flaws in a design. To eixsme that our impleuK^ntation 
matched the design, we ran our fiiQ suite of test, cases on a 
gate-level behavioral model We created this model from the 
complete chip schematics. We also tested a switch-level 
rnf>(iel Unit we created lj>^ extracting the FET nedist from 
die completed chip art work. Since tliis was the same art- 
work that manufacturing would use to fabricate the chip, 
tills regression served as a fmal test of the functional cor- 
rectness of both design and inipleinentation. 

^h\ tluB usage archtteci^d sts-te refers to a parrlfular pauem of oni>s aitd Keros 
on internal cliip nodes. 
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To ensure that there wej^ no coverage hol^ in the inteTface 
between the CPr and the memor>* and VO coiitroUer. we 
created a model that merged these two designs into a single 
behavioral model of the entire chip. We tested this model to 
gain cextaini>^ that both parts would work properJy together. 

Finally, we combined beIm\ioral models of the PA 7100LC 
\iith behavioral models of other chips in the s\^em and 
performed system-level xenBcation to ensure that each of 
the ciiips interijreted t}\e interchip interfaces consistently 
and to ensure that aU the chips in the system ftmctionecl as 
expected. 

Using this cxten^iv e verification methodology, the first silicon 
we delh ered allowed us to boot the HFATK^- operating system 
and enabled our system partners to progress towards meeting 
tiieir system sclieduies. 

Postsilicon Functional Verification 

Presihcon venluaTion, wliile providing an excellent first 
pass at ferreting out design or implementation flaw^. is not 
capable of identifying all bugs in a complex luistoni CPV 
such as die PA TIOOLC. Two factors make Uiis true. Fu-st, the 
simulation speeds of even higJi-le^el hehaviora] models (typ- 
ically less Uiaii 10 Hz) are not suflKieiit to exercise all the 
interesting state transitions witluu the CPU in the time avail- 
able. Second, experience In^s slu)wn that m a chip of this 
type tiiere are sometimes subtle differences between the 
presihcon model and actual chip behavior. 

To ensiue a quality CPU destgj\, we performed extensive 
l)ostsihcon testing on the PA TIOOLC in systems nmning at 
actual processor speeds (50 to 100 Mllz). The difference of 
about seven orders of magnitude in vector throughput be- 
tween riuming lest t ases on presihcon models aiKl code 
running on acUial silicon underscores Ihe polenHj^^d for 
Uiorougli testing ofl^red by postsihcon vedficaiiorK 

One of the goals of presihcon testing is to ensure that the 
simulation model matches the behavior specified by the 
design. We caniefi this goal into postsilicon testing and nin a 
suite of tests on af tual chips in a conij niter system. The 
tests behaved the same when they were nin in the cominiter 
system as they did on the PA TIOOLC presilicon models, 

We knew that postsilicon tesdng would be the last opportu- 
nity to find functional problems with otir processor before 
we sliipped systems to customers. Since the cost of fmchng a 
sericjus fmictional f>njblem tjnce systems are ship].ied is ex- 
tremely higlr. we wimted Uj exercise the processor 
dioroTjghly widi as n\m\y difleretu tests as possible. Ttie 
variety of features dial we had added to the PA TIOOLC 
made this process more difUculi. Each of these features had 
to be tested, usudly in combumiif>n with fjther featin^es. 

The tests that we used during the PA TIOOLC postsilicon 
verification effort inclufiud: 

• A collection of handwTitten tests, iitu in an enviroimtcnt 
that made them more stressful for Uie processor 

• Ratuloni c"ode general f>rs that profluced software that 
delil>eraUHy stressed enmplex areas of the processor 

• A coUei^tion of ap[)licatir>n software jjuiuding operating 
systems, benchmarks, anrl other applications. 



Hand written Tests. Hewlett-Packard has created a librar^'^ of 
programs whose puq>ose is to ensure that a processor con- 
forms lo the PA-HISC architecture, hi addition to this library, 
we created other programs to test specific processor fea- 
tures. We also created a small operating system thai allowed 
many of these programs to run sinmllaneously and repeti- 
tively rn a manner thai was stressful to the processor. This 
operating system would interrupt the programs at different 
internals and also change portions of the processor state 
(eg. cache and TLB ) l>efore rest^uting a t>rogram, F%iaily. the 
operadng system kept an extensi% e log of program activity 
to help us track down bug3 iliat it found 

in adtlilion to the programs that we ran under the special 
operating condi^ol^s, we created another set of handii'^Tilten 
tesis speciftcaDy to test the memory and I/O controller por- 
don of the processor. These tests used an I/O exerciser card 
to ensure that tlie memory and I/O controller would behave 
properly in the presence of any conceivable I/O tiansaction. 
We also used these tests to exercise the DRAM interface of 
the memoiy and 1/0 controller. 

Focused Random Testing. To su[>|jlement the handwritten 
tests we developed t wr> random code generators. Experi- 
ence gEuned ditring past processor designs had taught us 
that a certain chiss of Imgs appear only when a number of 
complex interactions occur within the CPU. It wasn't feasi- 
ble to create hatid written tests to cover all of these iJtlerac- 
tions because the time requirements to do scj would be |iro- 
i^iijitjve. Additionally some of the tests wouki twed to cioss 
so many interat^tions that it would be dlfftcult to guai'mitee 
adequate coverage \^ith handwritten cases. Usmg a random 
code approach, we used cofle generators lo create the test 
cases that found bugs in this clas^s. 

.\iiothtT strength (jf die random code ajiproach was that we 
were able to rake full advantage^ of die sfjeed of postsilicon 
testing. We could run till htmdwritten tests in a short dme on 
ati actual processor. Random code generators made it pos- 
sible to generale millions o\' different tests to keep the t>ni- 
cessor fully exercised t at speed, for long periods of tinte. 

One could create many conceivable rat id on \ code genera- 
tors, which could differ in nutny ways intitiflitig die tyjie of 
code t>roduced, fault latency ease of debuggitig, reiieatabil- 
ity and iniUalizalion. Design cHfferences in random code 
genei7itf>rs cause coveragt^ difrtneTU-e?! (oiie generator may 
be able to find a bug th^i! ancjlher missed), Random code 
genenitors mainly differ ui die seituence of instrucdons and 
in what constitutes initial and fmal processor state. In gen- 
eral it is best to nm code from as niaity different sources as 
possible to cnsmx^ t he best coverage. 

Of the two ratidom code generators that we developed^ one 
stressed the float ing-])oint unit and another siresseci the 
integer unit. Each of these generators produced tests 
consisting of: 

• An initial juocessoi' state 

• A si^tjiiejH t' of PA-RISC instructions 

• An exiiecled final processor state. 

The f<K used rant Ion i titjproach worked rxtremely well during 
the PA 7100LC verifKration effort. Ifsing it, we were able to 
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compJete thousands of machill€^hours of testing and identjiy 
a majority of postsUicon bugs. 

Our decision to emphasize random code testing paid off Be- 
cause of the proven effectiveuess of I he randoin approach, 
we will probably continue in this direction and make evolu- 
tionajy changes ro make the approach even more effective- 
Application Software. In addition to handwritten and random 
tests, we ran a variety of "real -world'' software applicat ions 
10 further eusiu-e Uia! we had found and fixed all hugs. 
Tliese applications were lul ended to help diagnose failures 
susjiected to be caused by the hardwaie. We booted operat- 
ing systems (like HP-irX) shortly after chips were available. 
We also conducted long-term operating system reliability 
tests when more stable hai'dware ami software liecaine 
available. We fiUetl out. our mniy of appl legation softwai'e 
tests with benchmark suites and other applications. 

Acceptance Criteria. A ctmllenging question that engineers 
and uKUiagers Face during any postsilicon verification effort 
is **Wlien axe we done?" Having cleai' criteria for the quality 
required to shii:) the rrhip i o customers is paramouju. For the 
PA TlOOl^C'j we used the fbl lowing acceptance criteria: 

• All failures are diagnosed to root cause. 

• No chip failures exist. 

• All handwritten code works. 

• Random code generators have run for a long time without 
fmtling any failures. 

• AppUcation software has run without any iiidication of 
hardwaie bugs, 

In-Circuit Emulation 

In addidon to constantly timing existing design and verifica- 
tion methf.>dologies in areas w^here high-impact producti\ity 
gains are essential to stay on the leacting etlge uf 1.1 le industry: 
we also look for new bi'eaktluough tecitnologies anci areas 
for pamdigin shifts. We considered in-circuit emulation as 
such an area for the PA 71001/'. 

In-circuit emulation means tliat a chip is moileled at. die gate 
level in field program nuible gate iirrays (FPGM) and con- 
nected directly to a chijj soc ket in a real system running at a 
reduced frequency This allows the modeled chip to iiin real 
system-level software. 

Continual increases in chip complexity must l>e countered 
with more effective verificatio!i to ensure higliHiuality first- 
silicon cliips. Tlie goal is to have a perfect chip, hut the re- 
quirement is to prevent masking bugs. A masking bug is a 
serious bug that causes a class of chip functionality to fail 
The verification team is unable to "^see behind" the bug to 
test for otlier failures in tliat area of fimctionaUt>'. The chip 
must be redesigned to fix the masking liug ajid must pass 
through fabrication bcd'ore this fiinctioniiJity am be tested. 
Emulation was \iewed as a way to prevent these serious 
masking bugs. 

Besides ensuring high-quality first silicon, it is also desirable 
to have enough presihcon simulation t.hr'oughput to verify 
any proposed posts iUcon hug fix. Since ttmting a chip is 
costly and timc^-consnming, incorrect bug fixes that cause 
adc^tional bugs must be eliminated. 



During the early phases of the PA 7100LC chip design effort, 
in-circuit emiilation lechnolrjgy came of age and was avail- 
able tlirougli external ventlors, W*^ investigated this new 
technologi^ in dejjth. For us, in-circuit emulation was viewed 
as a paradigm shift in verificatifin mid very attractive because 
it would; 

• Pro\ide near "real haidware" througtipul with apresilicon 
model 

• Allow thorough regression of any mask or fuU ciiip tu^^s 
necessitated by bugs or timing patlis found dmlng postsilicon 
veiification 

• Allow the fi nil ware ajul software teants to test their code 
liefore real haidware was available 

• Adfl anotlter importat^t debugging capability to our suite of 
debug tools thai allow us to isolate postsiUcon bugs 

• Allow us to recreate real hardwaie failures on a presilicon 
model and allow visibility to all internal nodes of the cliip. 

We also saw some areas of concern in piusuing in-circuit 
emulation. We perceived in-circuiL emidation as challenging 
and risky because it was a new teclmology within a very 
young industiy. We lacked expert ise in u.sing emulation 
tools, and it would be expensive to gain the necessaiy ex- 
pertise to make in-circuit emiiiat ion pari of onr clnp design 
methodology. In addition If) this, the emulation tools and 
hardware were very ex|)ensive. 

Our concern with technology risk was eased by several fac- 
tors. We were promised veiy strung (on-site) support from 
tiie emulation company that we cbt)se. They assured us that 
tools capable of handling large designs wotild be available 
early in our design cycle. We hafi independent coiTob oration 
frojn oilier HP entities, who had seen great success with 
emulation in ASIC design efforts. 

After weiglnng tlie potential advantageSt risks, and aur long- 
teiin needs we deterniitted to pursue in-circuit emulation. 
We ditin't believe that emulation was absolutely critical to 
om^ success on the PA TIOOLC, but we felt tliat dramatic 
impiovement In simulation tlTroughput would be required to 
verify tlie increasmg complexity of our next-generation prf> 
cessor design. This effort w^as simply the first step in a long- 
term strategic direction. 

Emulation Methodology 

Tlie real goal of our emulation effoit was to plug the emula- 
tion model into the physical system and nm at frequencies 
neai- 1 MHz. The team mochfied an IIP 9000 Series 700 work- 
station to provide the required boot ROM, disk, and 170 sub- 
system. A sj>ecitil processor bf>aJ d was designed that allowed 
the emulation system to pUig into the CT*I t socket. Tliis boaid 
also provided external cache f SRAlVr) and main meniorj' 
f DRA.M), One challenge w^as to keep the DRAiM ref resided 
since tiie processor wasn't rumung fast enough to keep mem- 
ory refreshed attd make forwaid progress on the code stream 
at the same time. We implemented a solution that coalesced 
the processor memory transactions between refresh cycles 
pro\idefl at a constant frequency by a module external to 
the CPU. Tills made relresli transparent to flie PA TlOObC 
emulation modeL Fig. 3 sliow^ our emulation setup. 
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Along with these physjca] challenges » we also addressed 
modeling issues, Tlie emulation conipaiiy provided an on- 
site, expericnceil engineer to join onr emulation team. The 
prehniinaiy j^oal was lo take a subsfantlai top-leveJ hlocic 
net list and pro\ e that our style of custom design would emu- 
late succ-essfiiU>; We chose a block that contained many 
unique and difftcult-to-model elements. It contained custom 
data path blocks and some control blocks, and tncludecf 
some lai'ge reguliir airays such as register slacks. TLB, jmtl 
inlemiil cache. Because of their size and regular stnicture. 
wc chose to model the cache, register stacks, aiid Tl.B rni 
external component bcjards using ITI^ parts and PALs. We 
lumed to iiKhislry tools to tninslak" [lur hlvraiy of russtom 
cells into cnuilaiiun gates. hn\ (luirkly Ibuiid ibai fbe tools 
were incapable ot generating accurate gate-level models. We 
were forced to (Teate handwritten translations forlhe en- 
tire library to make prt egress. 

Once we had completed this initial block, we ran the model 
in cosimulation mode with a VeriU^g siiuiilalor. The eniula- 
ti(m hardware modeled tjur target block, while the Verilog 
simulator tviodeled the rest of liu^ PA TIDOLC. The models 
exchtinged stable input and c)Utj>ut values after eveiT^' CPU 
clock transition. This apiuoach ill lowed Tuni-on ajid testing 
of the external component boards as well as flushing out of 
modeiing issues. 

Next, we attacked the full chip. Our emulation team createci 
a full chip model, which was partitioned and |)r ograrnnu^rl 
into the FPU As in the emulation boxes. This became a pain- 
ful process as we learned that lire hardware and software 
had never been ased on a desigtt of this size» and fatal tool 
ffulares st.f>p[)tMl i>rogress many times. 

We achie\^ed our first working model that ran t hrough all the 
firmware cofie shortly after \hv PA TIOOLC chiti at hieved tape 
release. We debugged all finnwine cot!e befoiT first silicon 
mrived from fabrication. This made silicon iunvon nuieb 
fasler Ih^AJi would liave been i)ossibltM>therwis{'. We resolved 
some nagging ejuuiation failure UKides in the dilTieult-to- 
model Ooatmg-point circuits within one month of receiving 



tJie first silicon chips. This emulation model allowed exten- 
sive tesiijig on the final cliip specification before the mtisks 
were released to fabrication. Only one hardwaie bug was 
found using emulation. 

From our emulation efforts we learned tl\e following: 

• Our method of custom AT^I design w^as difficult to model in 
emulation gates. Many mimiticipatcd race conditions were 
found which had to be resolved. For example, we allow 
races (e.g., between a latch's data signal mid its enable sig- 
nal) that we can guarantee will lie won on the vh]\}. Hcjw- 
ever, with unceHain delays on these signals wiilun the 
FPGAs, these races are easily lost. We also found [liLit 
wire-OR logic is veiy diffieuh Co model 

• We fouiHl that electrical cbaracteri^ition was the limiting 
is.sue fors!iip{>ing |>n>ducts in \<ihaue. I^Iruulation does not 
help this |Uol>lem directly .Although h docs help to t)revent 
masking bugs, it may not actually shorten Uie ship-reiease 
date. 

• Even thougli custom \'LSI c hips are much more difficult to 
emulate than ASICs, in-circuil **nmlation is a viatilt^ ie< luiol- 
ogy. j\s eniulaiion technology jiuUures, the effort retiuired to 
model comjjlex CPUs vltH become more rettsojiable. Becaust^ 
of the immaturity of in-circuit emulation technology at the 
time w^e wiuc^ using it, we were (jnly able to make a miitor 
contribution to the dtHc^Iopment of the PA TlllOLC with this 
technology 

The learning cun e for emu I at ion technr^logy was steep, but 
this effort can be seen as successful when used as a step- 
])ing stone to a new tec hnology paradigm. We identified 
many issues and short c^tjmings with using cunent emulation 
teclinologies to act^elerate vectt^r tluoughtnn. We can now 
continue to move tow^ards either applying more maiyre 
emulation technology or develfjping new approaches that 
better atldress the issues tliat we itlentified, 

Postsilicoii Electrical Verification 

The goal of postsillcott functional verificat;ion is U) identify 
faihires caused by mappro[)nate logic within die chip. These 
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functional failures generally manifest Uieniselves on every 
chip Mial we inanufarUire and will l>e unrelated to the (oper- 
ating iKHWi {e,g., temperature^ voltage,or tVequencyJ or \he 
CPU. 

Elect rieal failures are another class of failures that we sought 
out dui-liig ihe postsiliton venllcalion eflVjit for tlie PA 
7100LC. Electrical failures cause the ciiip to malfunction 
and topically have a root cause in some electrical phenome- 
non such as: 

• Ct round or power supply noise on the hoard or chip 

• Ckiuijiing betw^een signals 

• Charge sharing 

• Variation in KET speed or drive capability caused by 
vm iai i nr I i u t Ii e n lai i u Tac I n ri 1 1 g process 

• leakage related phenomena 

• Race conditions 

• [.■riforeseen interchip circuit interactions. 

Because the mtegrated circuit manufactiuing process varies 
slightly witli time, electrical faUuies may or may not be pres- 
ent on all chips that aie produced. Further, certain operating 
conditions win typically exacerbate tlie failure. Sometimes a 
failure will occiu- at any operating point and can be difficult 
to distinguish from a lunctionLil failiue. liow ever, mc^st will 
be dependent upon some parameter of tlie chip's operating 
point. 

To deal appropriately with failures of this class, we staffed 
an electricai verification effort for the PA 7100LC that was 
mostly independent from its functional venficatiou 
(described ear lie?). The goals ot lhi^^ etfbii were to: 

• Identii'y. isolate to root cause, and repair all ffUhires within 
the operating range possible in custonier systenis 

• Identify and isolate to root cause any failures vritliin a sig- 
nificant, w^ell-defmed region of margin outside of tliis oper- 
ating range. 

The first goal Ls clearly necessary' to |no\ide quality systems 
to customers. We created the second goal with the knowl- 
edge that in some cases, understanding the root cause for 
failures outside of om* expected operating range would i^e 
beneficial. Sometimes this knowledge would enable us lo 
make |>roacti^'e design changes wliich would increase chip 
yields, resulting ui low^er chip and system costs. Such knowl- 
edge is also usefid wheiv mo\ing the chip into a higher-he- 
quency range or a new process technology. 

To meet tliese goals, we instrumented several systems so 
that we could independently control each of the CPU supply 
x-^oltages and the operating frequency of the system. We inter- 
faced each set of controlling instruments to a host computer 
w^hich could systematically vary the operating point parame- 
ters, direct the system imder lest to nm a variety of possible 
tests, and observe and log the results of those tests. We 
placed each system under test in an en\m:onmental chatriber 
that wJis capable of var^dng the teuiperature from ^0^'C to 
lOO^'C. In each system under test, we also vaiied system 
parameters such as memory loading and 1/0 bus loatiing. 

In the presence of an electrical failure and the appropriate 
operating conditions, c:ertain code streams will not evaluate 
as eKijected, To ease the task of isolating electrical failmes, 
w^e created test code specifically for electrical veiification 
that stressed the various mterfaces and ftmctional units of 



the chip in turn. Each segment of this test code would indi- 
c-ate its progress as it ran. Tlris allowed us \o isolate a failure 
quickly to a particular, very short segment of the test code. 

In addition to this electrical verification code, we leveraged 
tlie random code generators used by tlie functional verifica- 
tion teanif and ran the code sequences tJiat they produced at 
the comers of the PA 7100LC's opemthig region. 

Using this data generating and collection system, we were 
able to create graplis that indicated pas.sing and failing code 
sequences as a function of voltage, frequejuy temperature, 
system ctinditions. and IC process. By inspecting the operat- 
ing ]Doint dependencies (or lack of dependencies) of a failing 
code stream, we could gtun msight mto the root cause for a 
failure. To confiiTU om' root cause analyses and potential 
fixes, W'C created new handw^ritten test codes, altered exi,st- 
ing silicon using focusetl-ion-beam miUmg, and performed 
electron beam probuig of cliips in systems. 

The PA 7100LC"s postsilicon electrical vei ification effort 
ensured that tlie chip would perform well in a wide range of 
electrical environments. It identified easily rejiaired yield 
limiters that allow^ed us to maximize ]^deld and minimize the 
cost of die CP1\ Each of tliese successes allow^ed om' sj^stem 
partners and customers to be more successful in meeting 
theii' goals. 

Debug and Test 

Since the PA 7100LC processor w^as designed to be the core 
component of a low- -cost workstation line, the factory cost 
goals and expected volimies clearly indicated that caieful 
attention to ease of test and manufactmabUity w^as necessary. 
Tlie following test featmes w^ere defined based npon design 
aiid niiijiufacturing needs: 

• Parallel lest vector cafiahility in excess of 100 MHz 

■ IEEE Standard 1 149.1 -compatible boundary scan interface 

• On-chip clock gating circuitry 

• Retention of internal state when the cliip clocks are halt ed 

• Interned scan with single mid double cdock step capability 

• Fully static operation to support off-chip Ihdq testing 

• Si gnat it re analysis capabihty for lest ing ihe on-chip 
instruct ion buffer 

• At-speed capture of internal states by scan registers. 

To meet manufacturing cost goals, the PA 7100LC liad 
aggressive quality and test time goals compared with our 
previous processor designs. Both of these items sign ifi cant ly 
affect final chip cost. A test methodology was developed 
eaily in the design phase to facilitate the achievement of 
these goals. The method olog>^ encompassed clnp test and 
characterizafion needs and manufacturing test needs. 

Testmg is accomplished through a nuxture of pai'aUel and 
scmi methocLs using mi HP 82000 semiconductor test system. 
Tlie majorit>^ of testing is done with at-speed parallel pin 
tests. Tests wiitten in PA-RISC assembly code cover logical 
functionality £md speed paths and iire converted througli a 
simulation extraction [uucess irUxi tester vectors. Scan- 
based block tests are iLsed tor circuits such as standard-cell 
control blocks and the on-chip uistiiiction buffer which are 
inherently difficult to test fully usuig pm^aHel pin tests. Iuoq 
measm-ements are also performed after some parallel tests 
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Fig, 4, Simplified diagrciiii uf a PA 71U0LC I/O driver. Stcilic current tajt flow from \'nrj to ground in the iiivertt^rs if the pud is not driven 
To V\jii or ground. For example, if the pad driver drives a oiu?, ths? p^d would bt* driven lo 3.3V (VoiJ. wMcli would cause static GJirrent lo 
now, invalidating t\w Idi jy te.s^.. For IpDQ measurements, the pad is driven to OV (ground) through the boundary scan clreuitr>' and pad 

driver. 



to prtj\ide additional defect rovei-age. Tlie parallel test se- 
quence is 600.000 states long, aiid 42 Mbits of scan vector 
are used during scan testing. 

To nieet our test quality tuu\ t ost goals^ we implemented two 
new chip'tesi lechniquefii tlvat had not been useci on previous 
PA- RISC im{>Iementations: Iddq testing and saiiipie-on-the- 
fiy testing. 

IpfjQ Implementation 

hnyQ tt^^ilitig is a lesi niHhodology in which the presence of 
defectis is [ielccled by lucasudiig dc eurreni wtien the chip is 
haitetl Nondefective full (.'MOS gales draw static cniTeni 
made up of leakage currents that are in the lu^ nuige. How- 
ever, defective gates can draw curretiUs many orders of mag- 
nitude higher. If a current measurement is made on tiie 
puwer supply <iiinng a stalh^ stiite, a good ciup w ill draw 
v^eiy little current and a deft^ciive cliip will draw much more. 
Ijjj^jQ has high obsert^abiUty and detects many different types 
of defects. It was decided early ui the design of the CPU that 
fcoQ ^^^^ catjabihty would be a desirable test feature. Ip^^g 
test capability was also desirable because it substantially 
reduces static power cqnsmnption. 

Design Rules. To support l^ng testing, must of the circuits 
leveraged tVum past PA- RISC imiiieTnentatit>ns thai drew dc 
current were eliminated. For eacli case In which using a 
circuit that drew stall t^ currcni wins the only reasonable de- 
sign solution, the t irciiitjy was retlesigned to be disabled 
with a test signal {luiing liinq measurements, Most blocks 
contaming pseudo-NMt)S eireuitry were redesigned using 
static CMOS circuitry. D^^anile circuits were modified to 
eliminate stalie cunent and to retain state while the chip) is 
halted. No FEl' gatt* Is allowed to be in a simation win- re it 
could float if tiie clocks are halted t>ecause tliis could possi- 
bly cause the FET to turn on. Intenial f nillups on inptit pins 
Eire (ii.sal>led during Iudq nieasuremenls, iiuliidiiig the IEEE 
1149.1 test pins. No drivi' fights are allowed in a stutU- state. 
.\ll nodes make a fidl transition to a supply rail, whit h is 



arconiphshed tlirough the use of restorative static teedtiaek 
when full CMOS trcmsfer gates ai*e not used in latches ajul 
multiplexers. Any bus that eon Id l>e completely tristared in 
any state uses a bus liolder circuit to maintain proper levels. 

Special Q a nsi derations. Tlie floating-point ALU, which was 
leveraged from tlie PA 7100 processor, drew static current 
and redesigning it w^as not feasible given our seliedule con- 
straints. However, it is possible to eliminate the static cm- 
rent during Ipog rneasureinents if the ALl' is not evaluating 
during the meiisuremenh Since Iooq testing was not going 
to be used to test the ALU, this was acceptable. ![)oq testing 
during parallel vectom is still po.ssibie, but if a floating-point 
operation oc^nir^ that uses the .ALU. the AlA^ loses its inter- 
nal state if IpoQ test mode Is envied duiing the test. 

Another mca of consideration frsr Ii>i>Q involvefi tlie I/O bit 
slices. Tlie CPl' uses two power supplies, Voo and V^jl, 
wliich are nominally at 5V and 3.3V respectively, Vuu sufj- 
piles ^ill of the intmud chit> logic, while Vy^ ^^ the supply for 
the outjjut driver tuilluj) FMTs. The input receivers on tJie 
CPU normally draw^ static current when an output driver is 
on that thives to Vm. In addition, a circuit to hold the cur- 
rent value on the patl am draw static cutTent if the [jad is 
not flriven tr> Vmj or grfMittd. llierefore, when I[)r>Q measure- 
ments me taken, the outinii tl rivers are driven to gromid 
through the use of tlie bot^ndaiy scan circuitry to eiiminate 
static current flow- in the receiver and pad holder cncuits 
(see Fig, 4), 11 le parallel lesrer drives ui|iut-only pins to V|)[j 
or gromid as appropriate, including tlie IEEE 1 149. 1 interface 
pins. The mialog inputs of tlie clock bulfers ai'e also driven 
to api>ropriate values to prevent static current. 

These niles were easy to adhere to and followed our ratio- 
nale to increase test capability with little (k\sigit impact. 
^[ftK; compliance was verilled by running t\mctional simula- 
tion cases through an HP proprielrny FF1'4evel switch simu- 
lator w'hit4i also luis the ability to check for static current 
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%iolations. Because of careful attention to the tlesigii guide- 
lines, only six ly^jiQ \ioIations were diyeovered wiTeii the 
siniuJations were mn, all of which were easily resolved. 

Iddq Measurement 

Xddq measure ments are taken using a pm'anietric measure- 
ment unit on the HP S20(}() tester (see Fig. 5). Mien a mea- 
surement is to be taken, a vectoi' sequence is itui to place 
tlte device under test (DIJT) into a static state. After the 
dynamic current transients have settled, the nieasurentent 
unit is connected to the chip pow er plane wilh a relayt an(i 
tlie regular Vnn supply is then switched out witli relays. The 
parametric measureuient unit then supplies aivd measures 
the current flowing into the [}VT The power plane for the 
DLiT is separated from the test fixture power plane by relays 
comiected between the c^hit* and the test fixture. By|>ass 
capacitors to control supply noise ai^e placed on Vf>o on the 
power supply side of the relays. This is important because 
leakage cmrents in large electiolytic eaijacitors can be lens 
of microamps. which would coniproiuise the acciuacy of t lie 
measitremenl. 

Typical measurements are in the range of 1 ^A. The Iodu 
current is dominaled by reverse bias leakage current and 
subthreshold leakage. Measmements are taken during wafer 
and package test, and foiu^ measiu'ements are made. Foiu' 
parallel vectors ate used, wliich initialize the registei's. 
cache, TLB, ami other state logic to zeros or ones and two 
patterns of alternating ones and zeros (to check for bridging 
faults). This provides a great deal of defect coverage w^hUe 
incurring minknal test overhead. 



Iddq testing was vej:y effectiv^e at catching defects on the PA 
TIOOLC.. Results indicate t!iat BCy^iof scan test failures and 
70% of paiallel failures are caught by IppQ testing. In addi- 
tior\ otiier tyjjes of defects are caught that might not be 
catight by conventional voltage-level testing, like gate oxide 
siiorts and some tyi^es of biidging faults. These can lead to 
reliai>iliiy problems over the life of tjie product, so it is im- 
portant to catch tliem at the chip test stage. 

We plan to do more directed Ip^Q testinjE; on future chips, 
uskig scan testing and parallel testmg to set U[> and measure 
current for specific chip states indicated by automatic test 
generation tools. Tliis shotild improve the level of coverage 
we get for lyyq rests. However^ one f problem that may occur 
is that off-FET leakage wlD increase in the effort to improve 
FET performance in futm'e IC processes. Tliis has a direct 
effect on the ability of IftoQ techniques to resolve low cur- 
rent defects. Additional techniques hke power supply parti- 
tioning may bt^ necessary to make IpoQ usable with more 
advanced IC processes. 

Sample-on-the-Fly Testing 

;Vn jnierestiiig new feature that is implemented on the CPU 
enables scan registei^ to capture the mternal state of the 
chip while tlie chip is operating at speed in a nonnal system. 
We refer to this capability as sample-on -the-fly testing. The 
sample is nondestructive, and the data can be accessed 
wlule the chip continues to execute code by scannhig the 
results out using the on-cliip IEEE 1149 1 -compatible test 
access port (TAP). Tlus feature was veo' useftil for debugging 
and characterizing systerti -level performance liecause it is 
essen tiddly a logic analyzer built directly into the chip whicli 
allows access to over 4000 internal state values. Samples can 
be taken with any IEEE 1 1 49 J -compatible test conti'oller 
ajtd approjjriate software. 

Internal Sampling. The internal sampling cajjability allows a 
sanii)le to oc cur when the architected PA-RISC inter\^al 
timer reaches a count that matches a preset value in a regis- 
ter and the TAP circuitiy is ui a specific stale. In tiie PA 
TiOOLC tiie mten^al tuner on the chip is a '32-bit. register that 
increments by one for eveo^ clock cycle that occurs on the 
chip. An additional 32 -bit register provides a value to com- 
paie with the value in the interval timer register. This value 
can be set by doing a PA-RISC mtcti (ino\^e to control regis- 
ter)! instmction. When the interval tinier value matches the 
value set by the mtctt instruction, a comparator circuit gener- 
ates a signal which is norm^illy sent to the control logic to 
cause an inten-al timer internipt to occur. This signal is also 
sent to the TAP in this implemeniation. If the ctirrent. TAP 
mstniction is I SAMPLE, the state of the chip is sampled into 
each scan register on the following chi]i state by aUowing 
each scan register to update during the ])hase when the 
functio!icd latch is not bemg updatetl. An mdication that a 
sample has occmTed is sent from one of the test pins when 
the sample is taken. The phi can be monitored by an exter- 
nal IEEE 1 149. 1-compatible controller system to dctemnne 
when data can be stiifted out of I he chip. The slutting of the 
sampled data does not coniipt the state of the internal logic. 



t This instruciion moves data to a rasnuol regl^tBF. In this instance it is moving diata to iM 

timer compsri^cn fegtster 
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Fig. 6. Siiinf>[e-oii-Llu--ily testing process. 

If another santple is desired, the above procedure is simply 
repeated. Fig. 6 siminiarjzes the saniple-on-the-fly process, 

Results^ .Mihon^ii sainpfi^'Oii-UK'tly lesilng capahiliiy required 
cart'liil eieciricai aiitl tiivtlng flesigm it has proven to lie very 
effective for debugging. It was vital at system frequencies 
approaching 100 MHz, since our Traditional extemaJ debug- 
ging iiardwait^ wa*^ mtable lo liuicllf m at IJiis fn'(jucn(\v i>c- 
cause of electrical ron.strainls. Sainplt^-on-the-Oy testing 
heciiine tnir only det)ngging tool in systenis with liiglvfre- 
qnency criticaJ paths, it was used several dozen times in 
high-speed characterization and led to ttie resolutron of sev- 
eral slow timing paths. It is cleaj^ thai its VPV freciuencies 
increase, more dchngging cin'uitr>^ \will neiHl to be inrkided 
dhx'cliy on die chip lo assist in tliagnosing tuncHor^ality, 
speed, and electrical fm lures. 

Debug Mode 

The sampie-on4lie-fly techniqiie allowed us to observe the 
vahies [present at mj;my nodes, at one very specific point in 
time, and a! any operating frequency. Since this test tech- 
nique u.scs the tesl act c^ss pott to ol>serve ttiese values, it 
pnnidt^s inhirmation aljout (he chip state at a relatively Itiw 
t>antlwittih. Tliis infonnat.ion is an extremely valuable diag- 
nosis tool for designers because it eruilHes tla^in bs kia>w 
exaci ty when a problem Is occurring. 



Sometimes, especiaUy when a probleiu is not yet fully under- 
stood, a liigher-handwidth path to diagnostic infonnation is 
useful to designers. To allow designers access to larger 
amounts of informaiion across broad slices of time, we 

added a debug mode to the PA 7100LC\ Tliis mode makes 
available extemaily the values of .se\eraJ key mtenial buses 
and control inierfac^es, on a stale-by -stale basis. 

Software can place the rbi|> in the debug mode by executing 
a series of CPr diagno.stic insiruciions. Softw^are can also 
be used to choose a set of signals to be made externally vis- 
ible. Tliese signal sets were carefully cliosen by die cliips 
designers as being indicative of tlie internal slate of the CPl '. 
Examples of signal sets that can be made visible using the 
debug mode include: 

» Internal instruction and data busc^ 

' CPI' to memor>" and I/O c^ontroUer interface 

• Key cache controller state uiformation. 

When the chip is operating in the debug mode, it identifies 
unused cycles on tile L^O bus and uses them to drive the se- 
lected debug information onto the I/C) bus, Tlie debug cir- 
cLiilr>' can be prograituned t>y softvi^are either to throw away 
delnjg data dtiring states when the 1/0 bus is luiavailalile, or 
lo cause the CVV piijehne to stall diu-ing these stales so tliat 
no debug infonnatioji is lost. 

Externally driving debl^g iTif<>nnation allows engineers lo see 
a sufilcieni amount of state information on a large enough 
ninnber of CPU states to be able to quickly direct further 
efforts at locating postsihcon problems. 

Both detntg mode and sampie-on-the-fly turned oin to be in- 
valuabk^ debugging aids in the liiglily uitcgmted environment 
oflhePA7lt)0L(\ 

Conclusion 

Sutvpoiiing design luetbfjdologies allow implementation of 
the features that a fnodnct rcqnin^s to meet its design gOtiis. 
The methodolrjgies used to synthesize, j)lace and route, sim- 
ulate, verify, and lest die PA TiOOLC processor were crucial 
to the processor's success. 
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An I/O System on a Chip 

The heart of the I/O subsystem for the HP 9000 Model 712 workstation is 
a custom VLSI chip that is optimized to minimize the manufacturing cost 
of the system while maintaining functional compatibility and comparable 
performance with existing members of the Series 700 family. 

by Thomas V. Spencer, Frank X Lettang, Curtis R. McAllister, Anthony L» Riccio, Joseph F. Orth, and 
Brian K. Arnold 



The HP 9000 Mf>ciel 712 dosigii is based on threp cusToni 
pietThi of VIjSI I hat provide much of the f^ysteiii's fiinctiojial' 
ity: CPU, graphics, and I/O. Tliese cdiips cfjintiiunjcate via a 
higlvperibmiance local bus referred I o as GSC (general sys- 
tem conned ). Tiiis paper wUl focus pnniaiiJy on the £/C) chiix 

A ni^jor goal of the Model 712 I/O subsystem was to provide 
a superset of the I/O perfonnance aiid functionaiitj^ avail- 
able from otl^er family members at a significantly reduced 
niiLmufacturing i-osi, This goal was bounded by the reality of 
a fmite amount of engineering resources, and it was olnious 
from the start that integral ing several olUie 1/0 functions 
onto a single piece of silicon could greatly reduce the foial 
I/O subsystem nianufacturmg cost. Each tmulion of the 1/0 
subsystem was examined individually as a candidate for 
integration. The v^alue of maintaining exact driver- level soft- 
ware eoinijatibility was also eviiluated with rest)ect to the 
advantages of minimizing the hardwai'e cost for each ^if rlie 
I/O functions. 

The investigation indicated that the optimal solution lV>r the 
Model 712 was an I/O subsystem tiiat centered around a 
single piece of custom \li5I. The chip that resulted from this 
investigation directly implements nitmy of the required I/O 
hmetions and ijrovides a glueless interface bctw^een the GSC 
bus and odier conunon mdustr>^ I/O devices, Tltis chip was 
named LASI, W'hich is an acronym that refers to tiie two 
niiyor pieces of functionality in the elupt IjAN and SCSL 
The LASI chij:* ^ilso provides several miscellaneous system 
functions I hat turther reduce the amount of discrete logic 
required in tlie system. 

Chip Over\iew 

Tiie LASI chip was designed in a 0,8-jjim CMOS process and 
is 13.2 nmi by 12.0 mm in size (including L/O pads). It con- 
tains 520,000 FETs and is packaged in a 240-pin MQCAD 
package. LASI dissipates approximately three watts w lien 
operatiiig at the maximtuii GSC frequency (40 MHz). L\SI 
was designed primarily using st^mdard-cell design meijiodol- 
ogies although sev^erai areas required full custom design. 

A functional block diagram of lASI is shov^Ti in Fig. L The 
m^orit>' of circujtiy In IJkSI is <*onstmied by only two func- 
tions, LAN and SCSI. Both of theses designs were purcha*ied 
from outside compai^ies and poiletl to lU's design process. 
The SCSI functionality is exactly identical to the NCR 
53C710 SCSI cond-olier. and the LAN functionality is exactly 
identical to ai\ hitel 82C596 LAN controMer 



Other I/O functionality that is completely implemented on 
I^Sl with HP internal designs includes: RS-232, Centronics 
parallel interface, a l>attery-backecl real-time clock, and two 
PS/2-siyle keyboard and mouse ports, hi addition, lASI pro- 
vides a very^ simple way of t f>nnecting the WD37Cf35C' flexihie 
disk control ler chi]> 10 die GSC bus. The system boo! ROMs 
are also directly controlled by the LASI ctiip. The Model 712 
provides l&-bit CD-tjiiality audio and optionally supports 
two telephone hues. LASI i)rovides the GSC interfet^e and 
clock generation (using digital t>hase-k)ckef! Ioo^js) for tiotli 
of these audio functions. Fig. 2 shows ati a]:»[H'oximate Ooor 
plan of the LASI chiiD. Tlie lloor plan shows tlie general lay- 
out and relative size of each block. 

LASI contains several system functions tliat heli> to minimise 
the miscellaneous logic required in the systeuL This includes 
GSC arbitral icHT ami reset controL IJKSl iilso selves as tJie 
GSC intemtpt controller. 

It is iitJSStble to use up (o fovir L\S1 chips on the same GSC 
bus* 1 ASl can be prognutnned at reset to reside in one of 
four diiTerent address locations. The arbitration circuit sup- 
ports chaining, and LASI t an be programmed 10 eit her drive 
or receive reset. 

System Support Blocks 

The following sections give a brief overview of each of 
LASFs m^jor functioiuil blocks diat inovide system support 
functionality in the Model 712, but do not directly suppon or 
implement any I/O function. 

GSC Interface. The GSC (general system connect] bus con- 
nects tile \ui\}OT VLSI components in the Model 712. It is a 
32-bit bus with multiplexed address and data. Tlie bus con- 
sists of 47 signals for devices capable of being a bus master 
The GSC bus is fiefmed to nm at up to 40 MHz giving a peak 
rraiisfer rate of 160 Mbytes/s. 

The GSC interface block in L\S1 provides tlie connectivity 
between d)e GSC bus and the wide variety *jf internal bus 
blocks, many of which have different logical at\d timing re- 
quiiements. This block ronveits the GSC has to a less c om- 
tilex internal L\S1 bus. Tlie LASI internal bus is veiy similai" 
to the GSC bus, but it is not as heavily multit>Iexed ^md is 
nioie flexible than the GSC bus iji that it easily accommo- 
dates the simpler interfac^e for the general-purpose I/O 
blocks in lASI. Tlie GSC interface block hantOes lius errors 
and keeps track of paiity uifomiatioti for other inienial 
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blocks, removing die associateci rooiplt^xily from these con- 
trollers. Bol h iiULstcr and slave devices reside on die LASl 
intenuii bus. 

lASl is a slave whenever the CPU initiates data transfer. As 
a slave, LASJ siipptids only subword and word wiite, and 
subword, worrl, mid double-word reads. '^ Internal slave de- 
vices only need Va support a subset orOiese tiansardons. 
There are five different protocol beliavk>i*s for slave devices 
in IjAST: unpaced byte wide, paced Ijyte wide, packed byte 
wide, unpaced word wide, ajid | jaced word wide. 

Unpaced devices, sticii i\s the real-time clock, don't use a 
hiitidshake with the GSC interrace, makinj^ their protocol 
vety Simple. \Mien a device requires a v^iriuble length of 
time to transfer data ii is called paced. The SC'Sl irUerfacc^ is 
an example of a paced device. A packed device is one that 
sends a sequence of b>1eB lo make up a word or double 
w^ord. The botjt ROM interface is an example of a packed 
device. 

• In PA-RISC a subword is typically one byie. a wof id is 32 bits, a double word m S4 bits, and a 

qiiadwartJi&l2Stjiis 



A simple strolie signal is asserled while internal data and 
address tjuses ate valid. InteniaJ devices have no flirecl in- 
teraction with bus errors. 

As a bus master, LASl is capable of initiating subword. word 
double- word, tmd quad- word iraiLsactious on I he GSC Ihis- 
Once one of LAS Is internal bus masters <»vvns the hu.s. ii ran 
signif>' the stai1 of a transaction by asseiling the master^valid 
signal (see Fig. ']). Tlie de\1c(* must then simultaneously 
drive its DMA adilress (master_address), tnuisiiction type, and 
b;^te enables onto the bus. On a read, the first availaljle data 
word will appear on the inteiTial bus wiien the inaster_ac- 
knowfedge signal is asserled by die (jBC interface. The GSC 
interface will not accept another master_valid until all Uie read 
tlaia has been transferred. 

If a I iineoui error, afldress paiity error, or dat:a parity error is 
eneourUered on the GSC tnis. fhr^ GSC itvtertace will always 
(io a iiorinal bandshiike for the tr^uLsaction l>y iLssei'liJig the 
masier_ac knowledge signal. TIip trftnsaction will complete as 
usual except that an ejTor is logged, disahhng ^ubitration for 
the device so it CLinnot lie a bus nuister again. This nieajis 
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that intenial masters, af the harrlware level never need to 
respond directJy to bus errom. When tiie (tSC inteiface 
block sees a timeout error it will, from the perspective of its 
internal bus blocks, complete a transaction normally. In tliis 
way the t tSC's en'or signaling mechanism can ronectly tonni- 
nat c Lin en ant triuisaction witliout adding complexity to 
IjASIs in tenia! blocks. 

Parity is generated in the GSC inteiface whenever L\SI 
sqiirces data or an address on tl\e bus. Parity is checked 
wlienever LASI is a data sink. LASl docs not respond to 
address parity errors on the GSC buy, whicli result in a 
timeouT error. 

Arbitration- lASI contains six different blocks capable of 
initiating a transaction on the GSC bus (sec F'ig. 1). To initi- 
ate a transaction, a block must first own (or gain control of) 
the GSC bus. Deciding which poiential master owns the bus 
is the Job of LASl's cirhitralion block. The arbnralion circuit 
in LASl pro\'1des interna] bus iyhitration for all ^ix internal 
de\'ices and pro\ides external GSC ai'bitration signals tor 
the CPU and an expansion slot . This capability allows LASl 
to function as the central arbiter for the GSC bus in low-e-nd 
systems. The aibitration circuit cau also be pin-piogranuned 
at reset to behave as a secondaiT arbitration device tliat is 
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controlled hy ait or her aibiter. This feature allows LAS! to be 
used in larger systems that provide their own aibitjation 
circuit. A second LASl can also be useci for L^O expansion 
in low-end systems in which the fii'st LASI is providing the 
central aibitration. Support for multiple LASl's on the same 
GSC bus makes the sjjeedy devclopnxent of multifimction 
1/C) expimsion 1 jo aids a relatively simple iask. 

The LASI design was simplified by requiring that tlie LASI 
aibitration circuit gain control of the GSC bus before granting 
tlic iutciTial bus to ]3otential bus masters. This saved a signifi- 
cant amoiuil of complexity in the GSC interfiice block as well 
as greatly reducing die mini her of cases that nee<led tu be 
tested during the verification effort. This simplilieation does 
create a couple of wasted GSC cycles for each transaction 
initiated hy L\S1. However, thLs inefficiency !ias a negligible 
impact on system perfonuaiice. 

The LASI arbitration clrctiit provides a simple round-robin 
sclteme that ])rovides rotjghly etjual access to all deviees. 
The aibitration circuist>' keeps track of the identity of the 
last device granted tl^e biLs and aU currently outstanding 
requests. (A simple tnith table makes sure the GSC resource 
is handed out fauiy. ) If no devices are requesting the bus, 
LASI will default to giimting the bus to the CPL:. Tliis has a 
small positive impact on pcrfoiTnance. given that the CPU is 
the most likely device to initiate the next transaction. This 
arbitration scheme helps si rttplify I he arbitration circuit by 
nol requiring it to monitor bus ac^tivity. Each bus master Ls 
responsible for beiitg "" well -behaved" with respect to bus 
use. 

The arbitradon circuit plays a key role in the eJTO!" hand ting 
strategy for IJVSL If an error occurs on the GSC bus while 
L/VSI is the bus mastei; the arbitration circuit will not gnmt 
die bus to additional intental devices until the CPLi clears the 
error by clearing a l:)il in the aibitration circuit. Tliis simpli- 
fies the design of other devices within LASI by not reqniring 
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them to use the error signal as an input to their state ma- 
chines. When an eiror is detected, the iraiisaclion vdU termi- 
nate normal] j; but no additional transactions wiU be allowed 
until the situation is rectified by software. 

Interrupt ControlleL A total of 13 different interrupt sources 

exist on the L-\SI chip. Each interrupt source drives a single 
signal to the interrupt controller bloc^k When the interrupt 
signal is asserted, the interrupt con I roller block wHl master 
the buis and issue a word wTite to the I/O external inierrupi 
register (lO^EIR), which is physically located in the CPLl The 
data transferred to ttie IQ_E!R contains a value that indicMes 
the source of the interrupt. The address of the IO_EJE and the 
mterrupt source value can be programmed by writuig to the 
interrupt address register located in LASTs interrupt control- 
ler block. Individual interrupt sources can be masked by 
setting bits in the interrupt mask register 

LASIs interrupt controller is designed to pro\ide a variety of 
interrupt approaches. The Model 712 uses only one of tlicse 
aitemadves. Asserting an intenaipt causes a write to the 
fO_£IR to be mastered on the GSC bus. Upon recei\ing im 
interrupt from L\SI (\aa IO_EIR), the CPl^ will read the inier- 
nipt request register located in lASIs intemipt controller 
i.)lock. One bit in tlie mtemipi request register is desigiiated 
for each potential interrupt source iti I^ASl The interrupt 
request register is cleared automatically after it is read !iy 
the CPIL 

Real-Tini€ Clock. The Model 712 needs to keep track of time 
when the system power is oft To this end, lASl provides a 
battery-backed real-time clock. The real-time clock is log- 
irally very simple and consists of a custom oscillator circuit 
and a 32'bit coimter that can be read and written to by soft- 
ware. The 32-bit counter is used to keep track of tlie number 
of seconds tliat have elapsed from some reference time. 

The osciilat<.>r imit operates at 32.758 kHz and t;ypi<*ally uses 
less tlvan 10 ^lA of current when operating on battery backtip. 
It uses a minimum of external circuitry (consisting of two 
ca|>acitors, a crystal^ and a resistor) to accomplish its task. 

Inside the LASI real-time clock, the l32-kHz signal is reduced 
to a 1-Hz signal by a IS-bit precounter. The l-IIz sign^il is 
then used to increment the main 32-bit counter Both the 
counter and the precounter are imr:>lemenled using simple 
ripi>le coimters. The 15-bil precounter is always cleared 
when software wiitcs to the 32-bit counter. 

Phase-Locked Loop Clock Genefators. The goal for tiie LASI 
ckick subsystem was to generate all tlie UO subsystem clocks 
from one crystal oscillator over a wide range of systeni fre- 
quencies. Tlie LASI clock l>lock generates five different 
clock frt^uencies retjuired for the wide variety of I/O inter- 
faces. Three of these clocks are subhaiTtionics of the proces- 
sor clock, and are generated using simt>le digital state ma- 
chines. However the 4()-MIIz clock and tlie audio sample 
clock are fixed-frequency clocks. The 40-MHz clock is used 
for the SCSI back end and RS-232 baud rate generator, and 
the audii i sample clock is used for the external CODEC chip. 
The rreqisetK-y of this clock ( lf3J):J44 MIIz to 24,576 MHz) is 
selectable on the fly by die audio and telephone interface. 

Two digiiai phtise-locketl loop circints are providcfl in iASI 
to generate tiie two fixed-ireuuency clocks from ihe CT'U 
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Fig. 4, Phase-icwked loop clock c^niroSlere. 

clock. These digiiai phase-iocked loops implement the equa- 
tion: fdiK-kout = (fcicK'kin X Ny(Ml X M2), where N, Ml , and 
M2 are digital coefficients stored in LASI control registers, 
fcioeidn conies from the main system reference clock. Tlie 
fclockout fi^ti^ one of the phase-locked loops is used for the 
audio clocks and the fdoi kuut from the other phase-iocked 
loop is used for SCSI, RS-232. and other I/O fimctions. At 
power-on, the processor initialization code (stored in the 
flash EPROMs) ioatis liie coefficients corresponding to the 
processor clock for the particular product. The audio sam- 
ple clock hjis two sets of coefficient control registers, which 
are selected by a multiplexer h^sed on a signal from the 
audio interface. Fig. 4 shows one of the phase locked loop 
circuits. 

Tlie phase-iocked loop circuits are completely digitally con- 
trolled, including a digitally controlled oscillator, digital phase 
detector, counters, and scan test hardwai e. Tliis design elimi- 
nates analog cotilrol voltages which are susceptible to noise 
and integration errors. The (iigi tally controDed oscillator is a 
ring oscillator with a digitally progiammable delay element- 
Tins design is ta|)al>!e of generating frequencies fif u\) to 105 
MHz, A cojni>i ration of citstom and standard-t^ell design 
techniques are used in this design. Each phase-locked loojj 
cell measures 1500 fim by S90 fim. 

General I/O Functions 

The blocks shown in Fig. 1 that make uji the general I/(J 
fimclions itvclnde the parallel pott, aucho and telephone inter- 
face. RS-232 port, and nexil>le disk and boot ROM interface. 
These I/O functions originate from HP internal stand ard-ccU 
designs that were originally designed using Verilog RTL 
models tmd then sjiithesii^ed into a stanilartl-c^ell design 
tjsing SjTiopsys. Some blocks were designed specifically for 
LASI while others were leveraged from previous HP ASIC 
designs. 

Parallel Port. Tlie parallel port is designed to be software 
ctjmpatible with previous generations of HP 900O Series 700 
I/O subsysten^ while minimizing overall complexity and 
chip area. This port allows interfacing to printers imd t>ther 
peripherals supporting the industry-standard Centronics 
pai-ailei interface. The parallel port signjds are driven di- 
rectly from the LASI chip without additional buffering. 

DMA was suppoited on previoits workstation controllers 
and therefore needed to be pro\ided on IJ^SIs controller 
However, since tu> central iJiVIA controller exists, all DMA 
hardware is contained withiti the iiarnlU4 1/0 block. Since 
parallel port bat i d w i d 1 1 1 m j n i i ( ' m c n t s kirv fai riy i u odest 
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Fig- 5. Address latching lo^c, and the dat^ and control lines associ- 
ated with the exlemai 8-bil bus, 

(about 400 kbytes/s), DMA is done \j^^^0[^ one 32-bit 
word of data, releasing the btis, iransferriTig one to four 
bytes of data over tbe interlace, and iben requesting the bus 
again. This ai>proarb keeps the DMA controller quite simple 
wliile easily arcotnniofiating byte unpacking. 

Keyboard and Mouse Controller. LASl provides suppoit for two 
IBM PS/2-style keyboard and mouse devices, making the 
keyboard imd mouse ports just like those used on a standard 
IBM personal comi>Liter. Tliese interfaces are new to tlie 
Series 700 r^uuily so there were no software fompatiV>ility 
issues, allowing us to optimize tlie design for low manufac- 
turing eost The interface provides only a minimal amount of 
hardw^are and relies on die diiver to do most oflhe work. 
The interface also performs tlie serial-to-parallel and j>aral- 
1 el-to-serial conversion and does a small arntnml of i>uf fet- 
ing. An interrupt is generated for eveiy byte of data received 
from the PS/2 de\ice. The software overhead is not a peif or- 
mjmce issue because of the extremely low data rate cjf t he 
interface. 

Flexible Disk and Boot ROM Interface. LAST supports an exter- 
nal S-bit bus that provides tlie capability to comiect discrete 
flash EPROM de\ices and a flexible disk controller with 
very little additional logic. Fig. 5 show^s a simple schematic 
of a fliish EPHOM anci the required address latclung logic on 
the 8-bit bus, ll wiis not cosl-effectivc to integrate tliese de- 
vices into the I^ASI chip* The 8-bit bus is also capable of sup- 
porting other types of 8-bil devices, giving some degree of 
flexihility to the I/O sy stent. 

The B-bit bus supports IxM bytes of address space (the first 
half of the LASI address space). All transactions to this ad- 
dress space 01^ the 8-bit bus begin with two address cycles, 
Tliese cycles transfer bits 18:3 of the adthess to two 
74GHT374-t>'pe 8-bit latches wired in series and controlled 
by LASI, Multiplexing the address on the data lines saves 15 
pins on LASI. 

L.ASI is capable of supporting b>l;e, word, and double-word 
reads and byte writes to devices on the 8-bit I jus. Word and 
double- word reatls are accomplished by doing niultii>le ac- 
cesses to devices on the 8-bil bus and packing the bytes into 
words iiefore retAinting them on the GSC bus. Word and 
double- word accesses require the address to be latched only 
once smce L.^SI ihives the lower three address bits directly. 
This greatly reduces the word and double-word access tinie. 
Double-word reads take approxuiiaiely 75 GSC cycles to 
complete because eight accesses ai'e requiied on tlie 8-bit 



bus. During each of the eight accesses a new address is pre- 
sent ed to the flash EPROM which results in valid data being 
driven to the 8-bit bus by this flash device. Byte accesses are 
also relatively slow (12 GSC cycles] to support very slow 
devices on the 8-bit bus. It is important to note fliat the 8-hit 
bus is not electrically connected to the GSC. 

LASI is designed specifically to support the WD37C65C flex- 
ible disk c^ontroller on the 8-bit bus. The Model 712 uses a 
personal computer style flexible disk controller instead of a 
SCSI-btised flexible thsk controller because of the signifi- 
cantly lower cost of the drive mechanisni. The flexible disk 
controller was not integrated into the LASI chip because of 
the low cosl of the WD37CG5C cfiip and the i>ot.ential for 
SCSI drives to come down in cost in the future. The 
WT)37C65C shares the data bus arul two control lines with 
other devices on the 8-hU bus, but does not consume any of 
the IM bytes of allocated address space. Supporting the 
WD37C(35C requires six dedicated sigr\als and no external 
glue logic. LASI supports the WD37CC5C nmning in DMA 
mode and provides the CEipabiiity to move data directly be- 
tween main memory and the WT>37C65C without processor 
int.erventjom 

RS-Z3Z. The RS-232 block in LASI is an KP internal standard- 
cell design that emulates the behavior of the National Semi- 
c:onducU>r NSI0550A. Tl^e Verilog HDL description for this 
design was leveraged from previous HP ASIC designs use<i in 
other members of the HP 9000 Series 700 workstation family 

fJne difference between this block and the NS16550A is that 
its baud clock is derived from a 40-MHz signal. This allows 
the block to sluu^e I he phase-locked-loop-generatetl 4(J-[Vlli2 
clock with Uie back end of the SCSI block and eliniLnates 
the need to support aji exienial crystal or dedicated phase- 
locked loop for baud clock generation. 

Audio Interface. The Model 712 supports built-in CD-quahtj^ 
audio tmd an optional telephony card.^ The telephony card 
is DSP-based and provides simultaneous access to two tele- 
phone lines both capable of supporting voice, fax, or data 
modems. LASI provides the interface between the GSC bus 
and the audio and telephony circuitry. 

An objective for the Model 712 audio subsystem was to 
maintain complete software compalibility with previous dis- 
crete designs. As a result, a good <leal of the audio interface 
circuitr>' on LASI is dedicated to supporting this compatlbihty 
and is not optmtized for minimal manufacturing cost. 

The audio interface in LASI has two DMA channels that sup- 
port Ihe input and output audio streams. Each channel has 
two 4K-byte pages (.)f main memory continually resei-ved for 
iransferring data to mid from the CS42I5 C'ODEC. The buff- 
ering in the utterface is sufficient to guarantee isochronous 
audio operation, given worst-case GSC bus latencies in the 
Model 712. A wide range of audio formats is supported in- 
cluding 8-bit or 16-bit words sampled in either linear, u-law, 
or A-law format at a variety of sample rates from 8 kHz to 48 
kHz. ^ The clock that detenuines the simiple rate in the 
CODEC is generated in one of LASIs programmable phase* 
locked loop circuits. Communication between L^I and the 
CODEC is accomplished via a fuU-duplex, serial bit stream. 
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The high-speed serial bus over which LASI communicates 
with the daDgliter card is simiJar to a concentrated highway 
bus developed by AT&T but has several modiiicaiions. The 
core pinout is the same using the signals data transmit (DX), 
data receive (DRI. and fimne s.nvciironization (FS). but tlie 
definition of the bus has been extended to incorporate 
control of external buffers and bus reset, 

Commnnication mith the lelephciny card is accomplished via 
two TTY channels internal to LASI Tlie serial concentrated 
highway bus data is multiplexed onto tlie high-speed serial 
stream and sent to the CODEC and tlie telephony card. 
Since TTY de\ices are used, the driver for the telephone 
system is a liighly leveraged version of the existing TlV driv- 
ers. The aodio interface and HP Teleshare- have a connnoii 
digital interface which resides in LASf. HP Teleshare is de- 
scribed in more detail in the article on page 69. 

MegaceO I/O Functions 

LASI contains two megacells whose designs were purchased 
by HP from externa] vendors. The decision to do f.his was 
based on maintauiiiig softvvtire conipalibiiity witfi |>asf HP 
9000 Series 700 workstations aiui the availability of engi- 
neering resources in HH In both cases, an impoitant goal 
was to maintain the iniegrity oftiie megaceh as much as 
possible. A definite ijomidary was tirawTi between function- 
ality leveraged from external vendors and new design work. 
This boimdaiy proved vital to fmictional veriiication and 
production testing. 

LAN Megacetl. IEEE 802.3 \AN support, is provided by a 
niegacell (ieriv^ed from tiie Intel 820596 LAN coprocessor. To 
understand the integration, two key aieas should be consid- 
ered. First, iniporting the megacell at the artwork level 
solved some problems and imposed others. Second, in the 
area of interfacing, (he integrated megacell ehniinated a 
substantial nujTiher of chip pins but raised some pmtocol 
issues that had to be overc^ome. 

Tiie L\N jnegacell was imported inio our IC^ design flow at 
the artwork level, Because thc^ originjiJ bUel design wa.s 
done in a ctiytoni fashion, a netliijl tran.slatiun would have 
required a significant iy lunger design time and a much larger 
manpower tteploynieul than the artwork traJislation. Even at 
the artwork level, several inodificaiions were matie because 
of differences between tlie original C'MOS process design 
rules and those of our target process. 

One challenge in unporting the megacell ai \\w artwork level 
was developmg a verification slralegy that allowed concur- 
rent simulation fif the megacell and the rest of the chip. Be- 
cause the megacell vt^ndor used propneti^u^^ simulators run- 
ning in a maul frame erivironmer^t, rjie ventiof simulation 
models couldn't be used in our Verilog-based enviroimient. 
tJaidware modeling was exi>lored. btrt characterLsties of the 
pan made this solution iiupraeitcral. Converting either func- 
tional representations or transistor-based r^^pi'esentatitms to 
Verilog IIDL raised too many concrems about n^odeling accu- 
racy. In view of these roadblocks, an unconventional ap- 
proach to sinuilaiion modeling was employed. First, FET- 
levei UK Kiel was exiracte^l from the artwoik. ""niis model was 
turned on and verified using Intefs production test vectora 
and a proprietfuy in-bouse simulator. Second, the in-house 
simttlator was compiled ai^d Uj\k<Hl into tlie Vedlog simulator 



using a procedural-level interface. Third, a VeriJog HDL in- 
terface module was \^Titten that defined synchronization 
events for data transfer between the two simulators, and the 
model was rev^erified using productlOD vectors. Hnally, tests 
were run tMt were si>ecificaMy desired to test the mterfece 
betft een the megaceU and the internal bus. 

hitegrating the LAN megacell did provide a clear win by im- 
proving the ratio of I'O to cxire area %Tien J?old as a separate 
device, the Intel B2C596 lias 89 signal pins devoted to ihe 
host interface. Once the megacell was integrated, all of 
these signab remained on-chip. In addition, 77 of the re- 
mov ed signal pins had output drivers, so the associated 
power and groimd pins were eliminated. 

The megacell did require a stnall amount of cirruitr>^ to inter- 
face the 82C59(} bus to the LASI intenial bus. The primarv^ 
difficulty in tl\is area was burst transactions. The system bus 
wanted to know at the start of the transaction how many 
words were to be bursted. hi contrast, the S2C596 burst 
protocol woiiJd only indicate whedier or not it had one more 
word !o burst. To minimize c;ompIexity and avoid tlie area 
associated with a FIFO buffer, the decision was made to 
support only t^^o-word bursts. This logical intersection of 
the two bursting protocols provided a bandwidtii utihzation 
improvement over nonbursted traiisactioius while minimizing 
chip ai'ea and devt^lopinent time. 

SCSI MegacelL To provide SCS1~2 suppon. LASI uses an NCR 
S3C710 megacell. This megacell was iiitported into oiu- design 
methodology as a netlist port. Tl^e design was translated from 
NCR's standardni-ell library' to HP s cell tibraiy. A few miique 
components were added to HP's hbraiy specifically to sup- 
port the St 'SI megacell, Wliile this created some challeuges^ 
doing a scbetiiatic" port allowed more fiexil:*ihty to optimize 
tfie aspect ratio of the megacell for a nioi'C efficient fioon^l^Ji' 
Tills technique also masked differences between NCli's pro- 
cess and HP's process. Verilog models for this schematic 
port were simulated in the conventional way. 

The programming and SCSI bus model for the 53C710 
megacell is completely compat tble with the industry-stan- 
dard component uiarkeled l)y NCR. However, the host side 
interface of the megacell is modiried to eliminate the pads 
i-md replace them with standard-cell comp<ments. These 
components connect directly to internal megacell signals, 
providing an interface to the chifj's internal bus. 

The 53C710 can be a master and a slave device on the GSC 
bus. LASfs internal bus protocol for sJave transactions re- 
quires only combinational logic Ijeiween the megacell and 
the internal bus. As a slave, byte and word transactions are 
supported in the megiic^elL If SCSI is a bus master, the inter- 
face supports all the transaction types needed by the mega- 
cell, with the heltJ of a small slate tnaclihie Inrated in the 
SCSI interface block shown in Fig. 2. SCSI data is tyi^icaJly 
transferred using four-word read and write transactions on 
the GSC bus. 

Test Support 

The I Hi maty olyective for LASI testing was to t>ro\4de an 
extremely high level of coverage with a limited iunouut of 
test development resmirces. Test stipporl was comphcated 
by the diverse natitre oi' the c:ircuils on lASL 
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The non-megacell fimctionaJity is tested by a combination of 
parallel pin veciurs in foi\jmu lion willi autoniatically gener- 
ated scan vectons. LASI lias an enlKUit^ed JTAG (IEEE 
1149.1) test block. 25 distributed internal I/O c!evice semi 
chains^ aiid embedded test fimctionality in Ihe l/CJ pads. The 
^JTAG test block contains a test access port and boundary- 
scan architecture defined in IEEE standard 1 149. 1-1990 and 
private instructions used for c:Iock tonti'ol, full<'hip step 
control, and specific scan-chain functions. 

To niaxiniize test coverage tor the Lwu nvegacells and to 
niininiiise the ret|iiired test develoi jnient resources, the vec- 
toiTs used for prodnction testing by Intel and NCR are used 
on LASI. Doing tiiLs requires multiplexing all niegacell sig- 
nals to pads to create what looks to die cldp tester like an 
Intel 82C59ti or an NCR 53C710, depending on the test 
mode* 

This teclmique proviiies importajit veritlcation and test cover- 
age, but complicated the design. Each outinil pad includes a 
thi'ee-input multiplexer, and each input pad drives signali=i to 
tjiree destinations on-chip, slgnificaixtly increasing the load- 
ing. The additional routing complexity raiiuired devoting 
more space for routing ch^mnels. and tiie lai:ger pads reduced 
]ilacement llexibiilLy. 



functionality need to be examined carefully in the system 
contjext before fletitling to integrate. Some important system 
considerations aie software cojnpatibility. the cost of tlis- 
Crete alternatives, the cost of printed circuit board area, 
customer coimecl rate, available IC fabrication capacityj 
available engineering developnient resource, and so on. The 
IjASI chip definititm is the result of a detailerl investij^at ion 
into tjptinii/Jng an I/O system for HP's low-end workstations. 
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Conelusions 

Integrating multiple I/O functionality onto a single VLSI chip 
can significantly reduce tlie cost of the I/O stibsystem. How- 
ever, many system dependent factors and each candidate 
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An Integrated Graphics Accelerator 
for a Low-Cost Multimedia 
Workstation 



Designing with a system focus and extracting as much performance and 
functionality as possible from available technology results in a highly 
integrated graphics chip that consumes very little board area and power 
and is 50% faster and five times less expensive than its predecessor. 

by Paul Martin 



The graphics sutas>'stem of the Model 712 workstation is a 
high-performance , low-c^ost solution that sits tiirectly on the 
system bus of the Model 712 and consists of the graphics 
chip, a video RAM-based frame biiffen ajid a few support 
chips (see Fig. 1). The project goals closely reneet those of 
the overall HP 9000 Model 712 program. In priority order 
these goals were^ 

• Very low manufactuilug eost 

• l^adei*ship graphics perfonnanee at entiy cost levels 

• Ai'chitectiiral compatibility'^ 

• Compelling new functionality 

Achievijig these goals required a m^or step in the evolution 
of HP enljy 'level graphics workstalion hardware. 'IVo philos- 
optiies helped the teani res|)onsible for tlie graphics chip 
achieve these goals. The first guiding philosophy was to 
design vnth a system -level focus. We examined all required 
funrliouality to deeide whether it was best td imijlenu^nt tt 
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Fig* L A block (iiagrani of the* esseiituii CQiuponenis thai niake ii\) 
i\w HP 9tKKJ Mudtd 712 workstation. 



m the graphics subsystem, the host proce^or, or some com- 
bination of the two. 

The second philosophy was to extract as much performance 
and functionality as possible from readily avail at>le technol- 
ogy^. We a^^oided leadingnedge t<*cluiology because of the cost 
imphcations. We did make im altempl. to xi^e all the featiures 
and performance availaljle in mattire technologies such as 
^ideo RAMs (\T^AJVls) and HP's CM0S2fiB IC process. 

This article describes the features and fuuciionality of the 
HP 9000 Model 712 graphics siilisystem. Tfw ronsiderarions 
that went into accomplishing tlie goals mentioned above are 
also described. 

Arcliitectural Compatibility 

The CRX window a<'relerator ciu-dt iutrcjdu<x^d by ffP in 
U)Ul uiark(*d the bt^gimuugof aslaJidardized gmphics hard- 
ware architecture Tor window system aceeleraticjn. ^ This 
ai-chitectLu-e was chosen For its simplicity of huplementaf ion 
and iov the cleim model it presentn to the software driver 
developers. One of our fundamenti^U design decisions was to 
accelerate key primitives only — a RISC' ajiproach. Many ear- 
lier controllers chose to acceJerate a large ganuit of graphi- 
cal operations such as ellipses, ;irithnieli(^ pixel operations, 
and HO on. Gniphk%s subsystems designed with these con- 
trollers were typically exttensive ami exhibited only moder- 
ate window system perfonnance, h\ die CRX and subst^uent 
accelerators, IncUKfing tiie Mode! 7I2's graphics chip, we 
decided to accelerate a carefully chosen smaller set of prim- 
itives, w^hicli are described ii^ the following sections. 

Block Transfet Writing pixels from system memory to the 
friune buffer or reading from tiie frame buffer to ^system 
memory is a block tJimsfer (see Fig. 2). Writes are used to 
transfer image data to the frame buffer. Reads are used pri- 
marily to save portions of the screen temporarily obscured 
tjy pojhup menus (see Fig. 2b). 



t A wmdow accaleraitar is the hardware ihit pfovidos \h& imaggs ssen on the worlcstation 

monttor. In particular, an acceSafaior is geared Taward speeding up environmants sLcli as 
thfl X Window System- The window aE:cBleramr enables ihe last mDvement of windows m 
the screen. scrDilmg of mi painting nf wJrtdow borders and hackgtour>ds. and so on 
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Bg, 2. (a) Block iraiisrer wt\\q. (h) Block iniiisfer read, ^liidaw B 
obscures wind aw A, Tlie obscured area is stored in system memoo' 
for restotation when the area of windfiw A is e^osed. 

Block Move. A t^iock move involves transferring pixels from 
one rertaiigiilar area in the frame Iniffer to another (]>ossil>ly 
overlapping) area in tlie frame buffer (Fig. 3). This Ls very 
useful for moving windows on tlie screen and scrolling lines 
of text ^^dtfmi a window. The block niove in tlie gi^afilucs 
clup supports Boolean o|)erations on the data being moved, 
such as bighlighting text by complementing colors. 

Vectors. The abihty to diaw vertoi-s (line segments) very 
(tiiickly is a requirement of design applications such as sche- 
matic capture and mechanical design (Fig, 4). Thus, the 
gmphics chip has a higli-perfonnance vector generator that 
creates X Wijtdow Sy.sleiu-conipliant Une segiuents. 

Fast Text. Characters are accelerated by the grapliics chip 
because of Ihc^ir pervasive use in window systems and the 
large potential for perlb nuance imp rove nieni over software- 
only solutions. A character is deJIneci as a reciangular array 
of i)ixeis that contains only two colors caile<l forcgroiintl 
;md background colors. Betause Iber'e are only two choices, 
a smgle bit is suffK:iein to specify the color of eacfi [jixel in a 
charac^ter. This improves perfoniTance by reciucing the 
amount of data that is transmitted from the processor to the 
graphics chip. For example, tiie hp character in Fig. Ti requires 
only 8 bytes of data versus 48 bytes Lf tfus optimization had 
not been made. 

Rectangular Area Fill. Tliis primitive is \\4dely used by win- 
dow systems to generate \\Tndow borders, menu buttons, 
and so on (Fig. C5). It is also impoitant for applications such 
as printed circuit board layout ;md IC physical design. Rect- 
angular areas can be patterned using i wo coloi^ or contain 
only a single color. Hai'dware acceleration again gives a 
large speedup over software-only solutions. 

Cursor. Until the late 198Us w^hen hardw^are cursors started 
apiieaiing in \ideo iCs, screen cursors w^ere tyiDicidly gener- 
ated using software routines „ Ilaidware support Ls a good 
trade-off because the circuitiy is relatively simple, and a 
system witltout hardware acceleration can spend a signifi- 
cant portion of it^ rime updating the cursor. A 64-by-(>4-pixel, 
two-color cmsor is supported directly in tlie giaphics chip. 
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Fig, 4. Vpf:t.i)r prirnitivi". A vector is drawn by txiniing on sucfessive 
pixels using the Bresenham algorithm. 

More complex fimcttonality such as wide lines, circles and 
ellipses, and -ID primitives are nor acc^elerated directly by 
the graphics chip because the application performance 
improvement was detemuned to be too low for the cost of 
imi)lementation. These functions can be eftlciently imple- 
mented in software. This is an exatnple of the systejn-level 
design trade-offs mentioned above. 

An important aspect of this standardized architecture is 
software leverage. It is estuiiated that; several software 
engineeiing years were saved on tlie graphics chip because 
tiie arcliitectuie is virtually identical lo that of tlie ( -RX 
graplncs subsystem. Tlie savings in software engineering 
time was applied lo tuning and adding new functionality 
instead of rewriting driveiu. 

Graphics Chip Operation 

To get a better understanding of the operation of the graphics 
chip let*s follow a graphics printitive through the block dia- 
gram sho^^ii in Fig. 7. A vector is a good example because it 
involves all tjf the blocks in die ciiip. Assume we have a vec- 
tor that starts at x,y coordinates 0^0, is 8 pixels long, and has 
a slope of 1/2. 

First, several paiameters are calculated to set up die vector 
ill tlie giaphics cliip. This is done by grapliics software [e,g., 
the X Wmdow System) nmning on the PA TiOOLC CPU. The 
high -level spec iQ cat ion of a vector is: 

• Startling x,y coordinate 

• Ending x,y coordinate. 
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Hg, 3, Block luove, Recliiiigular area A is moved to a new, possibly 
overlapping location. 



Fig. 5. Fast text primidve. A character is a rectangular array con- 
taining two colors, foreground and background colors. Only a single 
bit is needed to specUy each color. 
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This data is transfCTred across the GSC bus, through the 
GSC mterCace. and into a set of registers in the macro func- 
tion unit. If these registers are already in use by the macn> 
fimction unit the data is placed in a 32-word-deep FIFO 
buffer that the unit can access when h becomes free. This 
increases efficiency by allowing overlap between the soft- 
ware and hardware processes. The macro function uniT s 
basic job is to break dowTi the high-level descriptions of 
graphics primitives such as ^'^ctors, text, and rectangles into 
a series of individual requests to draw^ pixels. 

Drawing the vector is automatically triggered when the last 
of the parameters described in the specification is written 
into the macro function unit. The niatTO ftutrtion tlien steps 
its way along the vector using the Bresenhani algorithm- 
and issues requests to draw pixels. Since the slope of our 
vector is L/2, the y-coordiimte is incremented after every 
tw^o st.eps along the x-axis as indicated in Fig. 8, 

One might expect that a separate x- and y-address would be 
specified for each pixel to be written. However, with vectors 
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Fig< 6, Rectangular area, fill primidve, A rectangle is defined by 
comer, wdth, and height. Caior or partem may Im* applietl 

there is excellent coherence between suceess!\ e x- and y- 

addresses as pixels are drawn sequentially along tlie vector 
Thus, there are special bus cycles between the macro func- 
tion imit and the data fommtler tiiat specift^ that the pre- 
%ious X- or y coordinate should be incremented or deer em- 
enie<l to generate the new coordinate. This saves sending a 
fuM x,y coordinate ptiir for eacii iiixel diawn and significantly 
improves bandwidth use on the bus. This optimization is 
also useful for other primitives such as text and rectangles. 
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Fig, S, Pixel represerilation of a vector that, starts at coardiiiate 0,0, 
is 8 pixels iorig, aiid has a islope of 172, 

The data formatter's joh Ls to take requests aiui data from 
Llie iTiatTO ftiiiction unit anri ftinnat Uieni in a way diat is 
best for the frame buffer, hi the case of our vector, the pixel 
addresses received by lUt data formatter arc coalesced into 
rectangular riles that are optonized for the frame buffer. The 
data formatter also recognizes when special VTL4M modes 
may be enabled to improve perfomiance, based on the se- 
quence of data it receives from the nuicro hinction unit. For 
example, page mode (wMch is descTibed in more detail later 
ill this artk^lej would he enabled during a vector draw. The 
data fonnatter also stores the cuixent pixel address, vector 
color^ and a host of other paianieters for other primitives. 

The frame buffer controller generates signals for the VT^AMs 
based on the requests from the data fonnatten The controller 
looks at the sequence of writes and rea<l8 requested and 
ar^usts the timing on the VEAIVI signals to maxijuize perfor- 
mance. Por our vector, wc only need to do simple writes 
into the frame buffer, and cycles can be as fast as ^3 ns per 
pixel. More complex primitives might require data to he 
read, modified, and written back, possibly to a different 
frame buffer location. 

The graphics chip supports an S-bit-per-pixel frame buffer 
Tliis means tiut, usir>g nornutl techniques, only 256 colors 
can be displa^yed simidtaneoiisly. This is not always ade- 
quate for today's graplucs-oriented systems. Two methods 
can be employed to mcrease the perceiveil number of colors. 
Tlie first is dithering, in wliich an interleaved patf em of t«^o 
available colors is used to \Tsutilly approximate a requested 
color that Is not directly available. The second apprf>ach is 
color recovery. Color recovery is visually superior to dither- 
ing and is described later. 

Tile Model 712's entry-level configuration frame buffer uses 
four 2M-bit \TiAlM paits whicii allows screen resolutions of 
uj) to I (J24 by 768 pixels. Adchng four more VliMl chips on a 
daugliter caid enables screen resolutions up to 12S0 by 1024 
pixels. 

In addition to the screen image data, data for the cursor, 
color lookup table, and attributes are stored in offscreen 
frame buffer memoiy This is an area hi the \1deo RAM 
frame buffer that is never directly displayed on the CRT 
Data in this region is accessed in exactly the same fashion 
as the screen image data, presenting a consistent interface 
to software driver writers. 

At tills point our vector exists in the frame buffer but can- 
not be seen by the user. The \ideo block is responsible for 



getting the screen image data from the frame buffer and 
converting it foi^ display on the monitor This display process 
is asynchronous to the rendering process w^hich placed our 
vector in Ihe frame buffer 

To get the data in the frame buffer to the monitor, the \ideo 
■ Incfementy controller first se!ids a request to the frame buffer conLroiler 
to access the frame buffer data. This data is requested in 
sequential or scan-line order to match the path of the beam 
on llie monitor Next, the data from the frame buffer is run 
tlirough a color lookup table to traiLslate the S-hil vahies into 
8 bits each of red, green, and blue. The graphics chip sup- 
ports tw^o independent color lookup tables which are selected 
on a per-displaycd-pucel basis l)y die attribute data. Tliis fea- 
ture helps eliminate color contention between applications 
sharing the frame buffer Finally, cursor data is merged in by 
the video block and the digital \ideo stream is converted to 
analog signals for die monitor 

This completes an overview of die life of a vector primitive, 
from a high-level descriprion in the software driver to dis- 
play on the monitor. This basic data fiow^ is the same for 
other primitives such as rectangles and text. 

Low Manufacturing Cast 

how ctjsl was the printiuy objective for the graphics chip 
design. As a measure of our success, the mai^ufacturing cost 
for tJie Model 712 graphics subsystem is 1/3 the cost of the 
original CRX graphics subsystem. In addition, the entiy-levcl 
i{)24-by- 768-pixel version of the grapliics chip costs five 
times less than the CRX subsystem. 

These cost reductions w^erc achieved primarily through an 
aggressive aniomit of integration, w^iich is smiimarized in 
Fig. 9. The graphics chip represents the cuimkiation of a 
series of optmiizalions of the CRX family, combuiing ahnost 
the entire GUI (giaphical user interface) accelerator onto a 
single cliip. The only m^ior function not currently integrated 
is the frame buffer. Frame buffer integration Is not feasible 
today because RAM and logic densities are not quite high 
enough and there is currently a cost advantage to using 
conmiodity VRAM parts. 

Since the intraduction of the CRX subsystem, industi'y trends 
such as denser and cheaper niemoiy and inexpei^ive IC 
gates have contributed to cost reductions ui graphics hard- 
ware. Howx^ver, the graphics cliip's liigli level of mtegration 
also contributes cost reductions in the following areas: 

• Elimination of value-priced parts. The color lookup table 
and the digil^il-to-anjilog converter (DAC) have txaditionaUy 
been an expensive component of tJie graphics subsystem. 
This is especially true for systems capable of high resolu- 
tion ( 1280 by 1024 pixels, 135 MHz) and having multiple 
color lookup tables, such as the one built into the graphics 
chip. The digital phase-locked loop in the graphics chip 
replaces smother expensive external part. 

• The density of FETs achieved with the graphics chip, over 
4500/nini-, is sign i fie ant ly higher than with previous genera- 
tions. This is important because sihcon area is a major 
contributor to overall design cost 

• IC packaging and testing contribute significantly to the cost 
of each chip in a system. ReduciJig tlie number of chixjs elimi- 
nates tills overhead. Tlie graphics chip has a full internal scan 
path and many mtenial signature registers to reduce test 
time and ciup cost significantiy. 
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• Printed eirruit board area is a significant system cost. The 
eiiniination of a large number of cfiips not only rednced the 
printed cirruit board area fi'om about 60 in-^ for the C'RX to 
14 in^ for the graphics subsystem in the Model 712, but 
allowed the graphics to be iutegrdted directly onto the 
motherboard, eUniinating connectors, a bulkhead, and other 
mechanical components. 

• Power consionption for the grajihics subsystem in the 
Model 712 is only six watts. This low power consumption 
reduces power supply capacity and cooling requirements 
and therefore cc^t. 

• Manufacturing c:osts associated with parts placement, test, 
and rework are proporrional to the rnimber of discrete com- 
ponenLs in a system. Tlie gra|)ltics chip arKJ and otlier chips 
in the Model 712 include JTAG (IEEE 1149.1) capability' and 
signature generators to reduce the cost of printed circuit 
board test. 

Several factors marie tliis high level of integration practical 
First, improved VTSI capabilities sut:h as increased FF!1T 
density, decreasing wafer costs and tlic availat>ility within 
HP of video DAC technology. Secondly, the deskrop avail- 
ability of design and simulation tools capable of handling a 
model of over 30(1,000 gates and 500,00f^ (nuisisturs. VLSI 
design and verification were accomplished on HTM 1000 
Series 700 works lations using Verilog, Synopsys^ and m^uiy 
in-house IC development tools, Tlie performance of the 
workstations allowed tlie gatt*-level shnulation of entire 
video frames (1/60 s of operation) of over L2 million pixels, 
which was the first time thLs was accomplished within HP. 

performance 

The integration described above has also resulted in signifi- 
cant performance benefits. The two lUi^or reasons for Hie 
performance beneHts are wider busc!s tmd increased €loc;k 
rates. 

Wider buses are possible between blocks when they are fjii 
the same piece of silicon. Wider buses allow better commu- 
nication hand width at a given clock rate, vvith ver>' httle cost 
imijact. A good exinnple on the gniphics chip is liie much 
imjjroved communicatjon hi^tvvi'en the macro function unit 
imtl die data Ibnnatter which once existed as separate chips. 



liTcreasfng: 
OensHv 




i 



Optional 
VftAM 

DDDG 



Fig, 9, The evaluuoB of HP's 
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accelerator 



ItYcreased clock rates are possible because of the elimina- 
tion of chip-to-chip synch roni/ati on delays, pad delays, and 
printed circuit boarti trace delays. This compounds the 
bandwidth benefil of wider buses. HP's CMOS2GB technol- 
ogy" allow^s the bus interface, macro function luut^ and frame 
buffer controller blocks of tiie giaphics chip to operate at SO 
MIz while the three E>ACs and tw^^o color lookup tables of 
die video block operate at 13t5 Mhz. 

Intelligent system-level design also made m^or contribu- 
tions to performance. A simple example is the block transfer 
commandos which are responsible for transferring data from 
system memory to the graphics chip and Its frame buffer. A 
si)ecial mode was introduced to the memory and 1/0 con- 
troller in the PA 7100LC which allows fast sequential double- 
word transfers witinnn incurring the overhead of two single- 
word transfers. This simple chaiigi^ hoosteti block transfer 
performance by 50*J<>, 

Besides designing with a system^evel focns, the other 
driving philosophy was to extract as much performance and 
ufility as j>ossil>le hom available technology, A good (example 
of tins is the use of the advant ed features availaljle m the 
latest 2M'bit and 4M-bil V'RAMs. HI* lias been instrumental 
hi proposing and dri\Tng mnny of these enhancements 
within the JEE>EC committee over the hast few years. The 
nior<^ impoilmit featAires include: 

Page mode. This featiu-e eliminales the tieed to send rediuv 
dant portions of the pixel ackiress when writing in^o the 
JraiTie buffer. Tlie result is tiiat many operations can wtiu^ a 
pixel m as little as :J7.5 ns versus the more typical 70 ns (see 
Fig. 10). Tlie key here is that these operations must occur 
within a page of VRAM or a significant penalty is incurred. 
By default this page is long and narrow, which is good for 
block move and l)lock transfer operations but bad for ran- 
domly orientetl vectors and I'ectangles. To aclneve a belter 
performance balance^ we made use of the next feature. 
Stop register/split transfer This feature allows the frame 
buffer N» be organized in pages that are more square than 
long and narniw. Moving to this organization improves ran- 
dom vector and small rectangle performance significantly 
while only slightly reducing liirge horizontal primitive 
performance (see Eig, 11). 
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Fig, 10, All illusLration of the perfomianee improvemenE possible 
using the j)agc tiKjde t.o write pLxels into the fmnie buffer. Tliis ex- 
ample compares the* perfarmaiice cjf each riiude vvheji jusi four pijc- 
els are transferred to the fran\e buffer 

• Block write. M mentioned earlier, operations such as t^ext 
and rectiinguJar fill frequently require only one nr t wf> col- 
ors to be selected on a per-pixel basis. For tJiis reason 
VRAMs provide a mode (%aa a single bit) in wliicJi a pixel's 
color can be selected from an 8-bit foreground or back- 
ground color stored in the VRAMs. This translates into an Sx 
perfonnance improvement for these types of operations. 

The graphics chip's performance is simimarized in Tai>le I. 
The table compares the performance of the graphics cliip ai 
its theoretical hardware tiniit to its performance in HO-MHz 
and 60-MIIz Model 712 workstations and the Model 720 
CRX, Tlie final row in Table I, Xmark, is an industry-standard 
met lit" liiat is an average of several himdred X Window Sys- 
tem tests. 

Note that the graphics chip^s hardware Ihnit is signifi candy 
higher than tlie Model 712 system performance limits. Tliis 
headroom means that future systems with higticr levels of 
CPU perfontiajtce or even more highly timed soPiware drivem 
will be capable of even better window system perfonnaix*e. 



Table I 
Summary of the Graphics Chip s Periormance 

Benchmark Hard- Model Model ORX 

ware 71Z/B0 71Z/60 720 
Limit 

Block traitsfer 8-bit 96 M 60 M 52 M 42 M 

pixels/s f frame buffer to 
system meniory) 

Block transfer 8-bit 20 M 9U 8M 2M 

pixels/s (system memory 
to frame buffer 1 

Block move pixels/s 47 M 40 M 31 M 40 M 

(frame buffer to frame 
buffer, 500 by 500 pixels) 

Veciors/s ( 10-pixel. X 2.1 M i .4 M 1. 1 M i J M 

compliant) 

Text characters/s (Ci by 1,0 M 681 k 385 k 295 k 
13 pixel s/cliaracter) 

Ret*tangles/s (10 by 10 L7 IVl 790 k 588 k 270 k 

pixels/rectangle) 

Xmark — 7.9 6.0 5,6 

Compelling Functionality 

Beyond impro\ing performance and droppi(^,|siiit substan- 
tially it was an importimt goal to include u^ijifiilltiew func- 
tionality in the graphics chip. Below are some of the more 
Important addi(K)ns. 

Software Video Support. One of the design goals for the Model 
712 was to be abk^ to play MPEG and 1 1.26 i \ideo sequences 
without expensive barrlware acceleration. Tlirough careful 
aji^ijysis of the decoding iiroc ess it became clciir that this was 
possible at full frame rates and high visual quality using a 
combination of the following algorithmic, PA 7100LC, mid 
graphics enli an cements: 

• RevvTiting tlie standard decode algorithms to m;ike them as 
efficient as possible 

• Addini^ key instructions to the PA 7100LC 

• Implementing \TrV-to-RGB color space conversion in the 
grapliies c*hip. 
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YUV enc!odmg is used in many \ideo formats. It allocates 
proportionaieiy more bits to encode the brigiitness or luim- 
nance (Y) of the image, and fewer bits to represent the color 
(IT) ill tlie image. Since tbe human eye is more sensitive to 
brightness diiui col or this is im effitieni scheme. Iiowe%^er, 
since the graphics chip's frame buffer is stored in RGB for- 
mat, a conversion from \TjV to HGB is necessary. 

This conversion is a good example of an opeiation tliat was 
relatively expensive in software (a S-by-S 16-bit matiis mul- 
tiply) but simple to do in the the graphics ctiij) hardware. 
This smiple addition alone improves video playback perfor- 
mance by iis much as ^ffKi and helps enable fuU -iO-frame/s 
320-by-288-pixd resolution MPEG playback on a Model 

HP Color Recovery. The graphics chip incorporates a new 
cLisplay teclmoiogj^ chilled IIP Color Recovery* Using a low- 
cost 8-bit frame buffer and HP Color Recovery, the giaphics 
chip can (lisplay images that are in niatiy eases visually inflLs- 
tingulshable from those of a 24-bir fmnie tjiiffer <T)siing three 
time-s uum\ This feature is usehil for rlie fol lowing aiJpUca- 
tion ai'eas: 

• Visual multimedia (JPEG, MPEG, etc, J 

• Shaded mpchaulcal CAU models 

• Geographical imaging system 

• Document image management 

• Visualization 

• IIigh-<tuality business grapMcs. 

A block diagratn of tht.^ HP Color Recovery j)i[)eliiir is shown 
in Fig. 12. 

The IIP Color Recoveiy encoding scheme causes no loss of 
performance for rendering operations and is related to tradi- 
tional ordered dithering. Ditliering is widely used to approxi- 
mate a large number of colors with an S^bit irame buffer and 
is also available in the grapliics chip. 

Tlie HP Color Recovery decode is much more sophisticated 
and based on advanced signal processing techniques. This 
circuitiy cycles at 135 .MHz and achieves over 9 billion op- 
erations per second. HP Color Retroveiy is described itt 
more detail in the article on page 5L 

Multiple Color Lookup Tables. Tj'pically. entry-levei work- 
stat ioi I ao ( i i j ( • isoi i al cu n ip u t e r graphi cs subsystems have 
had only a single color lookup table with a limited number 
of entries, usually 25G. In the .\ Window Syslimi this results 
in the annoying flashing of luickgroiuids c*r whidow contents 
when a new application is startt^d that takes colons from 
existing apF>li cat ions. The graplucs chip solves this problem 



in a m^ority of cases by providing two 256-entry color 
maps. For most interactions in which the user is focused on 
a single application and the window manager this conv 
pletely eliminates Ihe resource contention atrd results in a 
\isually stal>le sfTeen (see Fig. 1:1). 

Software Programmable Resolutions. One of the problems of 
past workstation graphics subsystems is that they operate at 
a fixed video resolution and refresh rate. This has posed 
problems in configuring systems al the factf>ry and during 
ctistomer upgrades. The graphics chip incorporates an ad- 
vanced digital frequency s>Ttthesi?.er tliat generates the 
clocks necessary for the video subsystem. Tliis sjmtJiesizer, 
based on HP tmjpriefary digital i^hase-locked loop techitol- 
ogy, allows scjftware configuraljility of the resolution and 
frequency of tlie video signal. Thus, alternate ntonitors can be 
connected without chEmging any video h^irdware. Currently 
suppoited coniigurations m elude: 

• fj40 by 480 pixels 60 Hz, standiird \TilSA timing 

• Sm by 000 pbcels 60 Hz 

• 1024 by 1024 pixels 75 Hz and flat panel 

• 1280 by 1024 pixels 72 Hz. 

As new monitor timings appear, the graplucs chip can sim- 
ply be reprogratTimed witli the parameters associated with 
the new monitor. 

Stmuiuury 

We created the graplucs chip with lite jibilo.suphies of .'^stem- 
level-op^i'tiized design and opiiiUcil use of UHiuu)logy. Tliis 
enaltled us to mecT our goals of very low nianul^ctuting 
cost , leadershii) i>erfornTance at our cost jjoint, arcltitectural 
conipatihility. and introduction of some important new 
functionality. 
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aitd cn*iuivc* members of the graphics chip fk^velopment leant 
in the graphics hardware itnd sofiwaie laboratories in Fort 
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thanks to Harry Baeverstad, Tony tiarkiinsn Ui\\ Basti.tiev, 
Dale Beucler, Rarul Briggs. Joel Bnck-(Jengkn\ Mike Diehl, 
Ales Fiaia. Randy Fiscus. Dave MaithuKl, Bob Manley, Dave 
McAllister, Peter Meier, John .Mely.ner, Brian Miiier, Cku'don 
Motley, Donovan Nickei Cathy Pfister, Larry Thayer, Brad 
Reak, Cal Selig. James Stewart, aitd Gayvin Stong for their 
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HP Color Recovery Technology 

HP Color Recovery is a technique that brings tme color capability to 
interactive, entry-level graphics devices having only eight color planes. 

by Atithcmy C. BarkauB 



For niimy years the only practical way \a display high-quallry 
true color images was on a computer with a graphics sub- 
system pro\1ding at least 24 color planes (see the definition 
of true color on page 52). However, because of the higli cost 
of color graphics devices witli 24 planes, many users chose 
8-plane systems. 1 "nfortunately, using these 3-plane systems 
required giving up some color capabilities to save cost. 



HP has developed a teclinique cahed IIP Color Recoveiy 
which provides a method for displaying millions of colors 
within the cost constraints of an 8-plane system. For an ex- 
ample of the image quality pro\1ded by HP Color Recovery 
consider Fig. 1. Fig, la shows a close up of a jet plane stored 
as a ftdl 24-bil-per-pLKcl tnie color iniage. Fig. lb shows tiie 
same jet piaire tUsplayed using a traditiona] S-bit-per-pLxel 




Fig. L A I niP cifhir mia^e afui its 
ilsiJiered repreiietirations, (a) 
Tt'iK^ color 24-bit linage, (b) l\T:)i- 
tJiJ eight -bit gniphics ditlu^red 
image, (c) An IIP (kilorHecovery 
dithered irriniJe, 
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IVue Color 

In this paper the term true color is used to define color reproduction such that the 

underlying digital qifantizatior of the color within an image is not discernabJe by 
the human eye. In other words a contlnuaus spectrum of color, such as in a ram- 
bow, can be be displayed so that the color appears to vary con tin lid us I y across the 
image. In most computer graphics systems this i& accomplished usmg 24 bits of 
color infofmation per pixel. With -24 bits, any single pixel ean be displayed at one 
of2^^ [16.7 million! colors 

Some graphf&s systems may define true color to b€ represented by less than 24 

bits per piKel 



^stem. Finally, Fig, Ic shows how the jet plane will be dis- 
played wlien using HP C'olor Recovery in aii 8-bit-per-pixel 
mode on the HCRX-8 grapliii'S device. 

Of course, pretty pictures arent enough. Therefore, one of 
the primary design goals for HP Color Recover>' was to sup- 
ply the adtlitioiial < olor cai>aijilities witltout givitig up interac- 
tive perfomiajice. Another goaJ was to he able to work witli 
all types of applications nmning in a windowed environinenl 
such wy the X Wintiow^ System and HP \^ JE. The imi>leinen- 
tation of HP Color Recovery used in current HP work- 
stations meets Lhese goals. 

Traditionai Eight-Plane Systems 

Trailitional eight-plane systems ran display only 256 colors. 
Two approaches have been employed to get the best results 
with liiniled colors. The fust is chilled eitlier pseudo color or 
indexed color, Tliis metltod selects a set of 256 colors and 
then limits the application to using only Lliat fEced set of 
colors. For many applications, such as word processing and 
business graphics, this approach works reasonably well. 
This is because the resultant images are made up of very 
few colors. How^ever, when an application needs more th^iJi 
256 colors, such as realisric^ally shaded MC'AD (niechanital 
computer-aided design) images or himian faces in video se- 
quences, then another approach is needed. Since more than 
250 colors ai e required for these applications, a tecluiique to 
simulate more colors is used* For these appUcatioiis a tech- 
nique called dUkenng is eniiiloyed. The idea of dither is to 
approximate a single color l>y (displaying two other colors at 
uuemiixed pixel locations. For exattiple, a grid of black and 
wliite pixels can be displayed 1o simulate gray. Such a grid 
of black and white pixels will indeed look gray when view^ed 
from a distance. The primaiy problem witit ditliering is that 
since most people tend to work close to the dLsplay, dith- 
ered im^es are \dewed as having a grainy or textured ap- 
pearance Csee Fig. lb). 

Color Theory and Dither 

Before discussing the details of how^ HP Color Recovery 
works, an overview^ of color theory as it relates to computer 
generated images and dither should !)e helpful. This over- 
view describes how the human eye is tricketi into seeing 
color, color precision in graphics, and a dithering method. 



TVicking the Human Eye 

It is often not.eti Uiat compnter monitors use red. green, and 
blue (RGB) to produce true color images. A reasonable 
question to ask is: ^'Wby use these paiticular colors?"* If one 
examines the speclnmi of visible light, it ran be seen that 
red is at the end of tlie spectrum with tbe longest wave- 
lengths that rite htmian eye can see while blue is at the other 
end. Note that green is in about tbe luiddle. Also note that 
white is a mix of all colors. Therefore by mixing vaiying 
amoimts of red, green, and blue any color can be created. 
For exajuj^le, forcing both the red and the green CRT beams 
to be on at any single location w^Lll result in a dot that ap- 
pears yellow^ to Uie human eye. 

Thus^ one can create tlie visual appearance of any color by 
mixing the red^ green, and blue components at any pixel 
lotration. However, it is interesting to note that the hmiian 
eye can also perceive a new color when the component col- 
ors are tnixed spatially. For example, a checkerboard of red 
and greeri pixels will be perceived as yellow^ w hen viewed 
from a distance. It is this spatial mixing of color lo form a 
new color that is exploited by dither. 

Color Precision 

ht most systems that deal with true colon color is si:)ectfied 
to eight bits for eacli of the three color eompotienis: red, 
^een, and blue. The choice of eighi bits is based on two 
factors. First, the hmnan eye cannot distinguish an infinite 
number of shades because the dynamic range of the eye is 
limited. For the most part shaded surfaces rendered with 
eigiit bits per color appear smooth with the un tier lying quan- 
tization not readily apparent to the viewer. The second fac- 
tor that works in favor of usmg eiglu bits per color compo- 
nent as a stimdiud is that eight-bit bytes are very convenient 
to work with in a computer system. 

Simple Dithering 

When using a 24 -bit color system, any dis|:jlayable color 
component can bt^ specified using eight bits. For example^ 
consider the red componeni. Wlicn thcrc^ is no red in a pixel 
the red component is specified with a biniuy value of 
00000000, which is a decimal 0. A full bright red Is specified 
as a binaiy nuntber 1 1 1 11 1 1 L which is declnml 255. Of 
course, high-ent! display systems, such as the HCRX-48Z, 
use 24 i>its to store and display true color inlbnnation. The 
visual equality of these high -end <lisplays is shown in Fig. la. 
However, since low^-cost systems typically have a total of 
only eight bits per pixel to store the color information, an 
approximation to the true color image is made. The most 
common method is dither tisin^ three bits each for the red 
and green components. This leaves two bits for blue. Using 
fewer bits for blue is based on the fact tlvat the human eye 
has less sensitivity lo blue. With fewer bits available per 
color component, the (Quantization of tlte colors becomes 
apparent to the viewer The effects of using a limited nimi- 
ber of bits for each color can be seen in Fig. lb. 

Dithering approximates any color by using a combination of 
colors at actjacent pixels. When viewed from a distance the 
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image £q>peaiB to be the correct colon Howe\er, stnce dith- 
ered systems can store only a limited number of bits in the 
frame buffer, the prtmai>' task of the dithering logic is to 
select the best set of \*alues to use. 

For dithering purposes it is convenient to think of each 
eight-bit binar>^ component of color as a number in a three 
point fii'e (3.5) represen cation. This represeiuaiion means 
there are Hiree bits on tlie left side of the bmaj>^ point and 
five bits on the right side of the binaf>' point. For example, 
assume the true color value for red is given as the binai>' 
number 01011000. hi a 3.5 representation the niunl>er be- 
coines 010.11000 binary, which is 2.75 decimal Since the 
finaj diihereil values ran only be three-bit integers, it can be 
seen tfiat using only numbers two and three would be desir- 
able. Ideally, the ditiier would set 3/4 of the pixels to three 
and 1/4 to two. 

If we consider the original color component as being an 
eight-bit value in a 3.5 format, then the dither values stored 
in tile dither table should be evenly spaced between 
000.00000 and 000.1 Ul 1 (decimal Oi) to iUmost decimal 1.0). 
The output of tiie table is added to the original eight-bit 
color component. Once the addition js complete the vtilue is 
truncated to the desired number of bits for storage in the 
frame buffer. As a simple example assmne that we are cotv 
linuiug to work with a red component that is originally spe- 
cified i\s the binary nimiber 01011000. In addition assume 
tliat we are using a 2 x 2 dither to reduce the originaJ S-bit 
color componeni lo three bits. (The notation 2x2 dittier 
means that the ditlier pattern will repeat in a 2 x 2 grid 
across the image.) To iLse a 2 x 2 dither, the least-signific;int 
bits of the X and Y window addresses of the pixel are used 
to index the dither table. The following exajnjjle shf >ws how 
a 2 X 2 dither is applied to one pixel of the true color value 
for red. T^le 1 represents the values in a 2 x S dither table. 
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At the upper left of tlie v^lndow the X and Y addresses aie 
both 0. To dither the tlatii for this pixel location using our 
color value for red we do the following: 
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Therefore at address 0,0 we would store a 01 1 binary in the 
frame buffer for red. Applyhig the above ditlier would result; 



Fig. 2* Hesults after applying dither. Each box represents a pixel 
location on tlie display .'screen. For example, ad<lress (0,0) is defined 
as the upper left comer ol the display. Aim note that the numbers 
stored 'd\ each pixel location represent the results of appljinjg the 
dither values given in Table I to a red tonipDnent of color originally 
specified as 010 1 IfXlO binary (2.75 in our 3,5 notation)^ 

in three of the foiu* pixels \^dthin ever^' four-pixel block being 
stored in the frame bidfer with a %alue of Oil (see Fig. 2). 
The fourth pixel iit each block, the one with the LSB of Y set 
to a 1 and the LSB of X set to a 0, will have a GIO stored in 
the fraint" buffer. When a region of i\m color of red is 
\iewed from a distance the color would appear to be the 
correct value of 010.11000- If the dithered jet plane shown in 
Rg. lb is examined J it can be seen tliat it is ditlier ed using a 
method similar to the one described above. 

From a distance the colors in ihe riiibered intage are inte- 
grated by the eye so that they a[>pe;n^ (Correct. Hrrwever, the 
fundamental problem with (iither is that most dithered nn- 
ages are view^ed up close and so the flithering pattern is no- 
ticeable in the image. 

Dithering Is Key 

It is important to realize that to approxin:iate any true color 
value, a spatial region of the screen is required. This oflen 
leads people to say that ditliering is a method that trades off 
spatial resolution for color resolution. However Uii^ is mis- 
leadiug. Some pec jple beUeve thai a single-pixel object can- 
not be dithered. Actually a single-pixel object can be dith- 
ered. The result is dial (he (jl>ject will be one of the two 
dither coiors. (ioing back to ihe example above, a single- 
pixel red object s|>ecifred as binary 0101 1000 (decimal 2.75) 
mil he stored at m\y single inxel location as either bitiary 010 
or 01 1 (decimal tw o or three). Taken Ijy itself, miy single 
pixel is not a perfect approxirnatiou of tlu^ tnie color. How- 
ever, it is still a reasonable approximation. 

riie idea of being able to encode each pixel in the image 
independently by using dither is key to enabling color re- 
covery in work in an interactive environment. As a historical 
note it slujuld ije mentioned thai over ihe last few years 
several people have fie^eioped methods lo bring true cf>lor 
capabilities to eight -bit graphics devices. However, these 
attempts have been based on complex multipixel encoding 
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schemes. For the most part they have applied data compres- 
sion techniques to the data stored in the frame buffer. These 
methods have produced high-ctuality images, but the encod- 
ing is so complex that the user must give up int;emclive per- 
formance to use tliem. Because of the perforniance prob- 
lents these methods have not been widely adopted by the 
computer graphics community. 

HP Color Recovery 

The simplest explanation of HP (?olor Recovery is that it: 
performs the task your eye is asked to tio with an ordinaiy 
dithered wystem. In essenc-e, im IIP C'oior Recovery system 
takes 24-bit true color dnia generated by m\ application tmd 
dithers it down to eight bits for storage in the frame buffen 
Then as the frame buffer data is scamK^d from the fraiut^ 
buffer tf> (he dispLiy, it passes llvrough specialized digitai 
signal processing (DSP) hardware where the work of pro- 
ducing millions of colors is performed- Tlic output of the 
DSP hardware is sent to the display where millions of ct^lors 
can be viewed. It is important to recognize that since the 
data stored in ttie HP C'oior Recovery frame Iniffer is dith- 
ered, thousands of applicaticms can work with it. It is also 
hnporTaBt to recogniice thai t hese applications will nin at lull 
perforniance in an interactive windowed environment. In 
other words, applications do not need t<:j be changed to take 
advantage of HP Color Recovery. 

The Process 

HP Color Recovery is a two-part process. First, true color 
infonnation generated l)y ttie application is rlithei'ed and then 
stored in the frame buffer. The ty\w. of ii|)piit!aiion generating 
the true color infomiation is immaterial. For example, true 
color data can be generated by a CAL> application program 
or as part of a video sequence. The dithering may be ckjne in 
a software ilevice diiver or in the hardware of a graphics 
control len It. is very importimt to note that each pixel is 
treated independently This i>ixel inde]jendence is key t o the 
ability to work within an intt^ractive windowed environment. 
The second pan of the HP ( Olor Recovery process is to fil- 
ter the dithered data. The tilter is placed between the output 
of the frame buffer and the DACs that drive the monitor Fig. 
3 shows the IIP Color Recovery process starting from when 
an application generates true color data to when the image 
appears on the screen. Note that "^apphcation" refers to any 
program that generates true color data for display. 



After the application generates the data, it is sent to the de- 
vice diiver. 'ITie function of the driver Is to isolate Uie applica- 
tion from hardware dependencies. The driver is supplied by 
HP. it causes haj tlware dithering to be used when possible. 
However, there are times when the driver nuisl perform the 
dither in software. It. is important, to note that compared to 
other dithered systems, there is no perfonnmice penalty 
suffered by aji application using HP Color Hecoveo' dither 

The frame buffer stores the image data. Note that in most 
current systems the output of the dithered frame buffer is 
sent to the display, resulting in the common patterned ap- 
pearance in the image. However, with IIP Color liecovcry, ^is 
the frame buffer data is scanned, it is sent through a special- 
ized digitiil signal processing (DSP) circuit. The DSP is a 
sophisticated circuit that removes the patternuig from the 
dithered image stored in the frame btiffer This t ircuit per- 
fonns over nun* billion operations per second. Despite t his 
enormous amount of jjroc^essing the circuit is suriJnsingly 
small. It is tliis snudl size that makes HI* Color Recover^' 
inexpensive enough to be considered for inclusion m low- 
end graphics systems. 

The Dither Process 

tn HP Color Ret*) very tlie quality of the displayed image 
depends on the dither used to encode tlie image. During the 
devek>pnient of IIP Color Recovery it was fomid Qiat the 
size of the dither region detemunes how well a color can be 
recovered. It was found ttiat from a region of 2^^ pixels the 
technique can recover about N bits (jf color per component. 
Therefore- an eight-bit frame buffer that stores data in 3-J3-2 
I'onuat (3 bits eauii for red and green and 2 bifs for blue) 
would need a dither region of 32 pixels for each color coni- 
potTent to recover 5 additionaJ bit~s. Tlxns, using a 32-pixel 
dither region, an area in the image of unitbrm color can have 
tiie same \risual quality as an 8-8-7 image. For example, the 
sky behind the jet plane in Fig, Ic w^as recovered to within I 
bit of the original 24-bit true color daUi shown in the top 
bnage. 

Most dithers use a 4 x 4 dither region. Since a 4 x 4 region 
covers only IG pixels, a larger dither region is needed for HP 
Color Recovery. Therefore, a ditlier table with 32 entries 
organized as 2 x hi was selected. (The reason for this odd 
shape is discussed later in this paper.) In ad<iilion, most dith- 
ers are as simple as die one described earlier in lliis paper. 
However, there are cases in which a simple dither does not 
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work Weil. Note that in using the simple dither method de- 
scribed above, all true color values from binarj^ 1 1 100000 to 
11 11 1 11 1 wDiiId dither to 1 11 . P'or HP Color Recover^^^ the 
dither table includes both positive and negative nuniljers. 
This inipro\^es tiie color range over which tiie dither is 
useful. 

The HP Color Reco\'ery ditlier is a little diffei^nt from mosi 
di tilers. However it is on the .same order of conipjex^ity. it 
should aLso t>e nof ed that the HP t^olor Recover>' dilher b 
inciuded in the hardware of all HP graphics workstations that 
support this technique, Tiiis nieans that using the HP Color 
Rer<>ver>' ditiier does not cause a decrease in performance. 

The Filter Process 

In the example given earlier a red color component repre- 
sented by the binary' value 0101 1000 (2.75 in decimal) was 
Liscd to illustrate simple dithering. For this example we used 
a 2 X 2 dither regiou in which the end result of the dither 
was that 3/4 of the pixels stored in the frame buffer were set 
to 3 (Oil) and 1/4 of the pixels were set to 2 (010). It is easy 
to see that if wc average tlie four pLxels in the 2x2 region 
we will recover the original coIul Tltis can be done as fol- 
lows: 

([value_l X nuniber_set_to_value_l] + [\^ue_2 x 
n umber_set_to_value_2 ] )/total_nun^ber_pixels 

Using the example data we obtain; ([3 x 3] + [2 x t])/4 - 2.75, 

This averaging works ver:^' well in regions of constant color, 
such as the sky behind tbe jet plane in Fig, I. However, there 
is one fuiiflamental iasue that ntust be addressed for HP 
Color Recovery to be viable aiul that Is how to bandit* edges 
in the image. If edges are not accounted for then die resul- 
tant image will bhir. Tlie two-diniensional representations of 
an area cjf a dis|>lay st-reen shown in Fig. 4 are used to illus- 
trate the problcju of efige tk*tection and the way the prob- 
lem is addressed in IIP Color Recovery 

As iu Fig. 2, each box represents a pi^el location on the dis- 
play screen. In Fig. 4a tlie numbers represent the original 
tnte color data for one of the Cfilor componetit-s (e,g,, red) in 
a M4nt per pixel system. Fig. 41) sbows the stune region af- 
ter simple dithering bas been apijlit^l Fig. 4r shtjws the 
pixel values after the application of HI' t-olor lie co very. J^g- 
4c pjxel values represent Uie color data that would be dis- 
played on the computer screen. 

Region A in each of these figures is an area of constant color, 
whereas region B encompasses m\ edge. For iQustration piu:- 
poses, the dither region is again assumed to be 2 x 2 pixels. 

The dithered cfjlor data shown in Fig. 4l> is derivetl from the 
original color data shown in Fig. 4a and from using tbe sim- 
ple dithering tcclmique described in connection with Table L 
Tbe data shown in Fig. 4b is what would be stored in the 
frame buffer and displayed in a tyjjic^il dithered system (e.g., 
ng. lb). 

VVhen if is time to display P]x_] the data for tbe tour pixels 
shown as Region A in Fig. 4h would he sejit lo the filter Tbe 
data stored in the region would be smnmed aitd Uien cilvidetJ 
by the number of pLxels in Lhe region. Tbe stun of the pixels 
in Region A Ls 1 1 and 1 1/4 = 2.75. Thus, the ouIjhK oI' the 
filter when evaluating Pix^l would be 2.75. Ttxia output value 
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Fig, 4 (a) Plxet \^ut*s for the original 24--bil per pijcei <toIor daU. 
(b) T\w f'olor data from Fig, 4a aftt»r it lia^ been dithered and plated 
in ihci frmwe IniUVr. Tte wniild l>f* the data displayed in a typit:^! 
dilhejed system \v\\h thp reKiill appparirig as in Fig. lb. (c) Tlie 
pixels from Pi^. -lb after app]>nji^ HF* Color Hecoveo'- 

would be displayed on the eonipnter dis|)lay at PmJ 's loca- 
lion. Note tlial the oinptU of i\w fiher Is the exact value of 
the orighial dat^ at that poijtt m Fig. 4a. 

The next pixel along tlie sean line to be evaluated is Pix. 2. 
The filter region for evaluating Pix_2 would include the two 
rightmost pLxels of region A and the two leftmost pixels of 
region B (see Fig. 4b). A|>pl>1ng Ok* filter operation for PiK_2 
again residts iu the output v^ilue mateliuig the value al that 
location iu Fig. 4a (2.75). 

If I lie evaliratif)n is done ou Pfx 3. the pixels in region B 
wotild be siinuued iind tiion divided l)y the nimiber of pixels 
m the regiouT and tlie result would be 4.50. This value is very 
different from the origiiial data value of 2.75 hi Fig. 4a. Using 
the value of AJii) at Pix_3 would residt in edge smearing. To 
solve this prolilem a special edge lietedtir that loi>ks tor 
edges in noisy data is used. The itlea is to compare each 
pixel in the filter region with a value that is within ±1 of the 
pixel being evaluated. Since the* dala stored at Pix_3 is a 3, 
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only pixebu witJiin region B that have a value of 2, 3, or 4 
would pass tile edge compai'D. Tbi^ values tJial pass tlie edge 
detector are then summed imd ihe total is dhlded by the 
mmiber of pixels tiiat pass tiie edge comjjajc. For ?]x_3j only 
Pix_3 and the ]iixel hekjw if would pass the edge detector. 
Summing the two tJa5isit^g valuer together and dividing by 2 
gives a resuh of 2.50. This vahie is sliglirly different from the 
original value oJ' 2Jb, but it is a better estimation to the orig- 
inal than the 4.50 ot>tained without the edge detection. Tlie 
displayed values for the entire example region are shown ui 
Fig. 4e. 

SoftTware Considerations 

Since didiered franie buffers aie m conmion use today, 
many existing software applications can work witli a dith- 
ered frame buffer All of these applications could work with 
HP Color Recovery. 

On products that use the Model 712s graphics chipj wliicii 
is described in the article on page 13, and lilt's Hype rein ve 
(HCRX), HP Color Recovery^ is supported. In these products 
we fuive chosen to have IIP Color Reco\Try enabled as the 
fiefauh for 31) applications nm in an eight-bit \isual en\iron- 
nient. Thits wlien using the 3D graplncs Libraries Starbase, 
PHIGS, or PEXIUk and opening* an applif alion in an eight-bit 
visual enviionment with true color mode, HP Color Recov- 
ery will nonnally be enabletf Of t ovirse^ setting a:i applica- 
tion to use a pseudo color map will disable HP Color Recov- 
ery and give the application the desired pseudo color 
eapabihtj'. Because Xlib is tied into the pseudo color model 



rather than tlie 3D libraries, Xlib applications leave HP 
Color Recovery off by default. However, a mechanism is 
supported that allow^s HP Color Recovery to be enabled 
when using Xlib. ^ The biggest change is that Xlib applica- 
tions must do their own dithering. 

1 mplc men tatio n 

The implementation of HP Coh:>r Recovery .was based on t he 
assumption that color recoveiy would be most useful in 
erit ry-level grapliies products. Entiy-level graphics products 
(ire defined as products in wdiich there Ls storage for only 8 
bits per pLxel in the frame buffer. These same products that 
benefit the most from IIP Color Recovery aie also the on€*s 
where product cost must be carefully controlled. Therefore, 
the implementation effort was diiven with a strong sense of 
cost versus end user benefit- 
Dither Table Shape. As mentioned earlier, the dither region 
shape used with HP Color Recoveiy is 2x16. The optunum 
shape would be closer to stiuare, such as 4 x 8. However, die 
filter circuit needs storage for the pixels within the region. A 

2 X 16 circuit requires that the current scan line's pixel and 
the data for tlte scan line above be available, Tliis means 
that as data for any scan line enters the circuit, it is used to 
evaluate iiixels on the cunent scan Une. hi addition, the data 
is savetl in a scan line buffer so it can be used wdien evakuU- 
hig I he pixels on the next scan line (see Rg. 5). It should be 
noted that tlie storage for a scan hne of data uses approxi- 
mately one haff of the circuit area in the current implemen- 
tation. Therefore, if a 4 x S region had been used, three scan 
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K =£ Pixel Being Evaluaieii 
Pig. 5, A block diagrani of the HP Color Recovery filter circuit 
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Daia for Data for I'Jx el Beiit^ Evalumed 

Pixel tn : — i Stored in the Register Marked 

Begicm with an X in Ftg. 51 




If data from the pixel in the regiafi 
is withTti +1 Of -1 of the piisl being 
eviluai&d then pass it to ilie aider 
tree. Dtiterwise, send a copy of the 
pixel being evaluated to the adder 
tree. 



Fig. 6» A simplificHi representation of the lo^c circuitiy that exists 
in e^ch of the logic blocks in Fig, 5, 

Hue biiffers would have been requitned, almost doubling the 
cost of the HP Color Recovery logic. 

Rlter Function Logic, As explained earliei; the HP Color He- 
covei> nher riinctioii averages \he data within a region by 
summing the data for the pixels that pass an edge compare 
operation. The sum is then divided hy the number of pixels 
That pass the edge compare. Topically, builduig the Logic for 
a fiher htnction like tliis is difficult and costly because it 
requires a di\ide circuit running at the \'ideo clock rati* ui' 
135 MHz. The HP Color Recovery filter function is imple- 
mented so that this is not a problem. 

Tlie iinplementaTion detaiLs of die fQter function are complex. 
However, Lf we ignore the high-sfjeed pipeline issues and 
some minor adjustments reqoireti to optimize image quality, 
we can reduce the implementation of the filter fimclion lo 
the following equation: 

^(Frame_BulTer_l)atai)(W|) + (Bvaluated_Pixel)lWi) (Ij 

wliere k is the number of pixels in the filler and W^ Is a Hag 
equal to one wtieii a pixel passes an edge c^omjiaie operation 
mid zero when 11 doesu'l. Ttiis dag eaii be thongltt of as the 
output of the comparator shown in Fig, 6, 

Tlie idea behind this equation is that if a i>ixel fiasses rlie 
edge compLue, mchide it in the total. On the oiIut handt if a 
pixel fails the edge compare, then substitnte t tve data for die 
pixel being evaluated for the failing pixel. Tlie overriding 
assumption is dial the pixel being evaluated is a reasonably 
good guess of the tnie color tlata. The wotst case is that all 
the pixels arounti the stmiple fail the edge compare and the 
dithered color is used for that location. Since dithering uses 
a reasonable sample at each location t his extreme case re- 
sults in a reasonable image l>eijig disjilayed. 

Tb see how this works let's look at two examples. In ihe tltst 
example assume that the jjixel being evaluated is a single 
red dot specified using 0101 1000 binar>' (2.75 in our dec-imal 
jiuml)ering system). This \e the same color used in some of 
the exatnples desciibed eajHen Bf»wever, this time let us 
assume diat it is dithered to a vitliit* of 01 L Alsf> assimie that 
this i>ixel is surrounded liy greert Since ihe edge compiue is 
done on a per-color basis, all the pixels in the region except 
the pixel being evaluated will fad die e(Jge compare. In this 
case w^e will add a red vuhie of Oil thirty-two limes. The 
result out of t he addei^ tree in Fig. 5 will have a retl value of 



01 100000 (3.00 in decimal). Although this is not exact it wiU 
appear as a red dot in the middle of a grten region. In other 
ivords, a reasonable approximatioD. 

In the second example assume a region that is filled with red 
is specified with the same eight-hii binarj- value of 0101 1000. 
.Also assimie the simple dither method described earlier is 
used. In this case S'4 of the pixels will be stored as Oil. The 
other 1/4 will be stored as 010. Since none of the pixels fails 
the edge compare we will .send tw ent\^-foiu" pixels with the 
value of Oil and eight vnth the value 010 to the adder tree. 
The results of the adder will be a binar>' value of 0101 1000 
(2.73 decimal). In tliis case the output of HP Color Recovery 
will match the input true color data exactly. 

Hardware details. The fdierii^g logic, which w as showm in a 
sj^siems c(jntext in F^g. 3, is expanded in Fig, 5. As tlie frame 
buffer is scaimed, each pixel in the display is sequentially 
sent to the logic shown in Rg. 5. The left side of the figure 
shows the path taken sts the data for each pLxel read from 
the frame buffer enters the filtering lo0c. The data is sent 
both to a pipeline register for iitunediaie use, and to a scan 
hne buffer for use witen the next scait line is being evalu- 
ated. The 32 registers shown in Fig. 5 store the data for the 2 
X 16 region being evaluated. These registers are clocked at 
the pixel clock rate. Note that the data for each pixel on the 
display v^ilJ pass through the location marked with the X. 
\Mien a pixel is at tlie location X, it is called the pixel being 
evahiated. This means that the results of applying eg nation 
1 are assigned to the display at the screen address of X. 

The 32 pixels stored in the pipeline registers shown in Fig. 5 
are sent t,f trough blocks of logic that perfonn the inner loop 
e\"al nation of equatio!i L This irmer loop is essentially an 
edge detector The logic showii in Fig. 6 allows only pixels 
that have similar numeric values to tlie t>ixel being evaluated 
to t)e included in llie sutimiation. The suitimation logic is 
simply an adder tree that sums the results of the tuxels pass- 
ing tlie edge compare. Tlie filter function is perfonned in 
ixartillel for all the pixels within the filter region. 

Given the complexity of the fimction being performed in the 
filter circuit, the circuit is surprisingly small The entire filler 
circuit is made up cif ai)proximately 35,000 transistors. CJoni- 
paretl to tiie riuml>er of transistors required to increase the 
number of color planes, this is very small For example in- 
creasing the number of color planes from 8 to lt3 on a typi- 
ciil SVGA (Super VGA) system (UJ24 x 708-pixel resohttion) 
requires over 8.(KK).000 transistors, which is IM bytes of 
addit ional frame buffer menu:>r>. Because of the small size of 
the HP Color Recoverj^ circuit, it is ui expensive enougli to 
l)e included in entr^^-level graphics systems. 

Questions and Answers 

Thus far tlte concepts behind HP Color Recovery have been 
discussi'd. I( lias been shown tiiat HP Color Recover^' can 
supply addition^ ccjlot capabilities to low-end graphics sys- 
tems while maintaining an interactive windowed environ- 
nieni The following at e iuiswers to die most frequei^dy asked 
questions alK>ui die piitctical use of HP Color Recovery. 

• Question: Is there a difference between a 24-bit true color 
image and one displayed using HP Color Recovei^f? 
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Answer: Yes. If you view a 24'bit image and an HP C'olor 
Recove!>' iniagp .side by si fie there are tliffereiices. For exam- 
ple, the back edge of die wing in Bg. Ic has some artifacts in 
it. At normal size tt\e arlifaxi'ts can be found but are less iio- 
liceable than in Pig. Ic, 

• Question: How many colors aie reprodiirible with HP Color 
Recovery? 

Answer: In the best case HP Color Recovery can pro^dde up 
to 2'3 bits of accuracy. However, in typical images about four 
niiilioji colors can be reproduced, 

• Question: Are artifacts introduced by HP Color Recovery? 

Answen In areas of very low contrast., artifacts will show 
up. Again the back edge of the wing in Fig- Ic is a good ex- 
ample. 

• Question: Does HP Color Recovery look the same on all IIP 
products that support it? 

Answen No. The first intplemetitation wai? designed for the 
graphics chip used in tJie HP 900(1 Model 712 workstation. 
After that design was tl rushed some improvements were 
made which entied up in the HCRX fajnily of graphics de- 
vices. These changes are hidden deetJ in Hte details of the 
impiententation, enal)iing any application using HP Color 
Recovery on one product to woi k without change on the 
other products. 

• Question: Do applications need to change to use HP Color 
Recovery? 

Answer: It tlic application wiis T^Titten using a 3D apphca- 
tion program interlace the answer is no. Of course it must 
be riuniiiig in an eiglit-bit visual envuomuent on a device 
thai suppoils HP Color Recovery; in addition, the applica- 
tion must have been written to use the 24-bit true color 
model. However, if tlie application was written using Xlib 
then it jnust lie changed to do the chthering. Details can be 
found in reference I, 

• Question: Is there a way to turn HP Color Recovery off"? 

Answer: Yes. Set the environment variable HP_DISABL£_ 
COLOR^RECOVERV to any value. 

• Question: What happens to the color map in the IIP 9000 
Mode] 712's i^raphics chip when HP Color Recovery is en- 
abled? 

Answer In the graphics chip there are two hardware color 
maps. By default, the XI 1 server permanently dov^iiloads 
the default color map into one of these hai"dwai-e color 
maps. If HP Color Recovery is enabled the remauung color 
map is used by IIP Color Recoverj^; Sec ihe article on page 
4^^ for more infonnation about these color maps. 

• Question: What happens to the color map on HCRX graph- 
ics when HP Color Recovery is ejiabled? 

Answer: On HCRX graphics devices there aie tvvo hardware 
color niaps m the overiay planes ;md two in the image planes. 
By default, the XI 1 server permanently dov^iiloads the default 
color map into one of the overlay planes* hardwai'e color 
maps. Tliis is 1 nie in each of the foUowmg configm-ations: 
o Tlie HCRX-8 ixnd MCRX-8Z fiame buffer configurations with 
no transparency luive one hardware coloi' map in the over- 
lay planes and two in the image planes that ai'e a%'<ulable. hi 



this configuration the HP Color Recovery color maj:* can be 
do \^ii loaded into any oft lie available hardware color maps. 
) The HC:RX-8 and HCHX-8Z frmne buffer contiguraUons 
with transparency have only one hardware color map ui 
the overlay planes and t>nly one in The image planes. Since 
ttie hardware coUn* riiap for the overlay planes iilready has 
the default color niap loaded into it. tiiere is only one 
color map available for HP Color Recovery to choose 
from. Therefore, in this coniigurarion the HP Color Recov- 
ery color map is dowTiloaded into the remaining hardware 
color map. 
o The HCRX-24 and I1CRX-24Z hrniw buffer configurations 
with or without transpmency have one haidware color 
map in the overlay jjlmies and two in the image planes 
that ai'c available. In this coniiguratJon, when using an 
eight-bit \isual depth the tIP Color Recovery color map 
cmi be dowTiIoaded into any of the available hardware 
color maps. 

• Question: Does HP Color Recovery work with logical raster 
operations? 

Answer: Yes. Like any dithered frame buffer system, HP 
Color Recovery works with raster operations sitch as AND, 
OR.andXOR. 

• Question: How do image processing appUcations interact 
with HP Color Recovery^? 

Answer: There are t wo b^isic classes of image processing 

app I icat tons: f eati i r e II ! k I i n g and i m age enliancement . 

o Feature fmding. Most feature-fhiding applications aie 
based on edge fi erect ion. The residts of iinming one of 
these iyi>es of applications can be displayed using IIP 
Color Reco^^eiy. However, as mth other dithered frame 
buffei:s, any application usuig the frame buffer as the 
image source may have problems if it does not accoimt for 
the dithei'. 

o Image enhmicement. Image enhancement applications are 
typically used to enliance images for t he humaji visual sys- 
tem, rhe goal of many of these app head orus Ls to bring out 
low-level features of the unage. It is possible to preproccss 
the image and send it to HP Color Recovery. However, if 
there is a need for an exti-emely high-QUidity image (e.g., 
medical imaghig) a 24-bit frame buffer may be necessary^ 

• Question: If m\ image is dilhered nsing ii dither method 
other tlian the one developed for HP Color Recover^^, c^ui it 
be displayed on a system that supports HP Color Recoveiy? 

Answer: Yes. One option is to turn HP Color Recovery^ off. 
Howeven the inmge vax\ be processed with HP Color Recov- 
er>^ on. In this case the iniage wiU be viewable. The image 
qnaJily will be compmable to \iewmg the image *.in a typical 
ditliered system, but the dithering mtifacts will be replaced 
with a new set of artifacts, 

• Question: Can an image created using the HP Color E^ecovciy 
dither ntethod lie viewed on an eight-bit system I hat does 
not support HP Color Recoveiy? 

Answer: Yes. However, it is inTportant to realize Jhat witliout 
the HP Color Recovery back end the dithering artifacts will 
be visible in the image. 

• Question: Can a user read the frame buffer data? 
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Answer Yes. However, as with any dithered s>^eiii there is 
the issue of precision. For example, if the red data is gener- 
ated ivith eight bits of precision* then the read back will give 
a three-bil dithered value for the data The data on readback 
is not the same as the eight-bit value generated bj^the ^pli- 
cation, 

• Question: Does HP Color Hecover>' work mih nmltiniedia 

applications? 

Answer Yes. By remoiping the ciitiieriiig artiJ'acLs, image 
qualit^^ duiing MPEG (\ideo} playback is iini>roved. 

• Question: Does HP Color Eecoveiy impact application per- 
formance? 

Answer: No, Tlie HP ( "olor Recovery dither is implemented 
in fast hardwaie in l>ot}r the Model 712"s graphics chip and 
ihe HCRX gmphics subsystem. When hardware dithering 
cannot be used, stich as with \lrtnal memory double buffer- 
utg. a siiftwjire dither is performed by tlie device dri\'er. Since 
tlie (lit iter Is (he same complexity as common dilhers, there 
is no performance penalty' for using HP (i'olor Recovery 
when compared to using other dithered systems. 

In addition, the DSP circuit in tiie back end is placed iti Oie 
patii of the daia being scanned into the monitor. As such the 
DSP tioes get in Hie |)ath (without affecting api^licatioti per- 
formance) when the system is performing what the user 
sees as mteractiv'e tasks. 

• Question: Can an image generated using HP Color Recovery 
be displayed on output devices other than monitors (e.g., 
printers)? 



Answen Many appJications generate a print file. In this case 
the data tllspl3> ed on the monitor is not used ro create the 
prijtt file. Therefore, HP Color Rec-overy* wiU not interfere 
with the output Another melhod used to generate bardcopy 
is a screen dump. Unfortunately, a complete solution for 
dumping a color-recovered image to a printer is not avail- 
able veL 

Conciusion 

Color recovery brings added color capabihiit^ to enirj^-level 
systems. Since the techitoiogy is based on dither, these addi- 
tionai color capabilities can be brouglit to im en try -level 
system while maii^taining an interacti^ e environment that 
supports many current appUcations, 

Acknowledgments 

Many people have helped transform HP Color Recovery 
from an idea into a reality. The ILsi would be too long to 
print here. Without the list of names I hope e^ ery^one in- 
voh-ed knows tbai 1 ap|>reciate dieir efforts. However, there 
are several jieople Uiat I must Ust by name, TTtese peopte are 
Paul Martin, l^ariy Tliayer, Brian Miller, atitj Randy Fiscus. 
Additionally, a special thanks goes to Dave .McAllister who 
took my i^otes and turned Iheui Lruo real logic. Along the 
way, Dave found many innovations that led to a better de- 
sign. 

Reference 

1. HP Cohr Eeccii^r^ TeehwMm ^^P pubUcation Nunilier 
5962-9835E. 



)Copr. 1949-1998 Hewlett-Packard Co. 



April 1995 newlelt-Packard Journal 5fl 



Real-Time Software MPEG Video 
Decoder on Multimedia-Enhanced PA 
7100LC Processors 

With a combination of software and hardware optimizations, including the 
availability of PA-RISC multimedia instructions, a software video player 
running on a low-end workstation is able to ptay MPEG compressed video 
at 30 frames/s. 

by Ruby B. Lee, John R Beck, Joel Lamb, and Kenneth E. Severson 



TY-atlitaonally, computers have improved prodiictivily by 
helping ptniplF computp taster aiKi more ac-f uraiely Today, 
computers caii further improve productivity l),v helping 
people commuiucate better and more naUi rally. Towards 
this end, at Hewiett-Packard we have looked for more natu- 
ral %vays to integrate commmucation pow-er into our rlesktO|> 
mac'hir\eSj which would aUow a user to access distributed 
information more eaiiily mid t ommimicate with other users 
more readiJy 

We felt that adding audio, images, and \ideo infomiatioii 
would enrich the information media of text and graphics 
nonnally availabie on tiesktop computers such -as work- 
stations anfi personal ('onu>nters. However, for such en- 
riched multimedia communications to be useful, it must be 
fully integi"ated into the user's normal working environment. 
Hence, bs the technolog^^ matured w^e decided to iiitegiate 
increasing levels i:}f multimedia support mto both the user 
interface and the basic hardware platform. 

In terms of user interface, we mtegiated a panel of mnlti- 
media icons into the HP VT'E standard graj^hical user inter- 
face, W'hich comes with iiU IIP w^orkstations. These multi- 
metlla icons are part, of the liF* MPower product.^ HP 
MPow^er enables a workst^jtion user to receive and send 
faxes, share primers, access imd manipulate mitiges, lie^u- and 
send voice and CD-<inality stereo audio, send and receive 
miiltiruedia email, shai'e an X window oriin electronic %vhite- 
board witli otlier riLstributed users. £ind capture and play back 
\ddeo sequences. The HP MPow er software is based on a 
clienl/ser%^er model, in w^hich one serv^er can service around 
20 clients, whicli can be w^orkstations or X tennmaJs- 

In teiTUS of hardw^are platforms, we integrated successive 
levels of nmltiiuedia support mto the baseline PA-RISt' w^ork- 
siaiion.s.-' '^ First, we iniegraterl support for all thepopnUu' 
Image tormat.s .sncii as 4PKG (Joint Photograpliic Experts 
Group )t compressed images/' Tlten, we added haidwaie 
and software support for audio, starting with 8-kIiz voice- 
qualitj^ audio, followed by supp<3rt for minierous audio for- 
mats mclnding A-law. \i-\i.n\\ arul 16-biT linear mode, with up 
to 48r-MIz mono and stereo. This allowed high-fidchty, 

t JP£B is an rmematianaf digital iftiage camptissiort standard to csntrntraus-mTO (rriult^ 
levei) stiJI imag&s fgrayscafe and col Of). 



44. l-kHz stereo, 16-bit CD-quality audio to be recorded, 
manipulated, and played back on HP workstations. At the 
same time, w-c supported uncompressed video capture and 
playback. 

hi January 1994, HP introduced HP MPower 2 J) and the 
entry-level enterprise workstation, tlie HP 9000 Mo<lel 712, 
which is based on tlie n^ultimecha-enlianred r*A-RlSC t^ro- 
cessor^ kiiuwn as the PA TIOOLC.'^'-^ Tlie video player inte- 
giated in tlie MPower 2.0 product is the first product that 
achieves reaJ-tinte MPEG-i (Movuig Picture Experts 
Group)^* video decompression via softw^are mmiing on a 
general-pmpose processor Typically real-time MPEG-l de- 
compression is achievefi via special-pnipose chips or 
boaicls. Previous attempts at softw^aie MPEG- 1 decompres- 
sion did not attain real-time rates J *^ The fact that this is 
achieved by the low^-end Model 712 workstation is signilicant 

In tliis paper, w^e discuss the support of Ml'EG-compressed 
video as a new (video) data type. In particular", we discuss 
the technology that enables the video player integrated into 
the HP MPower 2.0 product to play back MPEG -coj repressed 
video a1 real-time rates of up to 30 frames per second. 

Digital \ldeo Standards 

We tiecided to focus on the MPEG digital video format be- 
cause it is an ISO (Inteniational Standai'ds Organization) 
standai*d. and it gives the highest video fidelity at a given 
conipression ratio of^uiy of the fonnats tiiat we evaluated. 
MPEG also has broad support from the consumer electron- 
ics, telecommunications, cable, and computer industries. 
The high compression capability of MPEG translates mto 
lower storage costs a^id less l>cUidwidi It needed for transmit- 
ting video on the netw-ork. These rliai acterist ics make 
MPEG an ideal format for adchessing the need for detail in 
the video used in tecluiical workstation mai^kets and com- 
puter-based trairung m conmiercial workstation niarkets. 

MPECt is one of several algoritluiiically related staJidaids 
shown in Fig, 1. All of these dlgitiil video compression stan- 
darfls tise the discrete cosine transform (DCT) as a fim da- 
mental component of the idgorithni, j\lteniativcs to discrete 
cosine-based algorithms that we looked at include vector 
quantization, fractals, mid w^aveleta Vector quantization 
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Fig. 1. iJrgiul \Tdeo standards based on the discrete cosine tinnsforrri. 

aigorithms are popular on older computer arehiiectures be- 
cause they recjuke less computing power to rlecompress, but 
this achaiitage is offset by poorer image qualit\- at low band- 
widtli (higli compression) compared to MPECt for practical 
\'ector {jiiantization methods. Algoritlims based on wavelet 
and fractal lecbji<jlog>- have iJve potential to deli\^er video 
tldelity f;oiTipamble to MPEG, but fhere Is presenUy a lack of 
iiitlustr>' cotisenijitts on standardizatioii. a key requirement for 
our use, 

Anotlier advantage of a liigli-performance unplenieiitation of 
MPEG is the ability to leverage the miprovemeiiLs to tiie 
other DCT-l>ased algoritluns. Although the relationslups 
sho^Ti in Fig. 1 do not represent a true hierarchy of algo- 
rithms is useful for illustrating increased complexity as one 
moves from JPEG to JVlPEG-2, or from H.261 to MFEG-2. 

All of diese formats have mucli in common, such as tile use 
ol' die OCT for encoduig. Tlic \Lsual litk^litj' of die algorithms 
wki3 the key selection criterif)n ;uid not ease of implementa- 
tion or performance on exist mg hiuxhvare. 

Although JPEG supports l>otli lossy an(i lossless eompreS' 
sion, the term -JPEG is typically ass€>c*iated with the lossy 
specification, t The primary goal of JPECi is to acliieve iiigh 
compression of photographic images with little perceived 
loss of image fidelity. .'Ut hough it is not an ISO standard, by 
convention, a setjuence of JPEG lossy images to create a 
digital video sequence is called motion JPEG, or MJPEG. 

H;261 Ls a digital \ideo stmulard from the telecomiuunica- 
tions standarfLs Ijody ITl'-TSS f fomierly known m CC\TT). 
H/i61 is one of ij suile of conferenc hi^ standards that make 
up the umbrella H.^320 spetnOcatitjiL lV2i\l is ofteu referret] 
to as PHU (where P is an integer) because it was desigitetl 
to fit iniD ntultiples of (>4 kbits/s bandwidth, Tlte fu-st fr^une 

f In lossless compression, decomiHessetJ data j$ irteniicai \q tha ortgmal image data In I assy 
COmliression riec-jmtircrjreit data 15 a -3ood aDpfoximalinn gS the nnninal irrvflcjE' ^iata 



(linage) of an H.261 sequence is for all prarUcal purposes a 

highly compressed loss>^ -JPEG image. Subsetiueni frames 
are l>uill from image fragments (l^locks) iliai are either 
JPElMike or are differences from the image fragments in 

previous frames. Most video sequences have high franie-to- 
frame coherence. This is esi>ecially true for video conferenc- 
ing. Bet^atise tlie encoding of the mtnemeni of a pic*ce of an 
image requires le.ss data than an equi\"a]ent JPEG fragmeni, 
H^6l achievt^ higher visual fidelity for a given bandwidth 
than does motion JPE(i. Since the encoding of the differ- 
ences is always based on the previous frames, die te<'hniqiie 
is called Jo n*ja*T/ diffejvncifig. 

The MPEG-1 specification goes even further than H^61 in 
allowing sophisticated tecluiiques to achieve lugh fidelity 
v\ith fewer bits. In addition to fon^ajd differencing, MPEG-1 
allows backivju-d differencing (which rehes on information 
in a fiUure frame) and a\eraging of image fragments. (Tor- 
ward and backward differencing are iiescnt>ed in more de- 
tail in the next section.) MPECi-l achieves quality compara- 
ble to aprofeasionaJly ret)ro(luced VHS \ideotape even at a 
single^speed CD ROM data rate (L5Mbhi=/fiJ.'^'l^ MPEG- 1 
also specifies encodings for liigh-fidehiy audio synchro- 
nized with the \ideo. 

MPEG-^ contains arldit tonal stiecifications atid is a stiperset 
of MPEG-L The new features in MPEG'2 are targeted at 
broadcast television requirements, such as support for 
frame interleaving similar to analog broadcast techniques. 
With mdespread deployment of MPEG -2. the digital revolu- 
tion for ^ideo may l)e comparable to the digital audio revolu- 
tion of die last clecade. 

The approximate bandwidtiis required to achieve a level of 
subjective visual fidelity for motion ,JPEG, H.261, MPEG-1, 
anci MPEG-2 are shown in Fig. 2. Motion JPEG will primarily 
l)t^ used for cases in which accural e frame editinj^ is impor- 
lanl such as video edit i tig. 11.201 will tie used primarily for 
V if leo t onfereiieing, bit( it iilso luis potential for use iit video 
mail. MPECr-i and MPEG-2 will be used for publishing, 
where fidelity expectations have been st^t by consumer ana- 
log \ ideo tapes* com]juter-based trauiing, games, ittovies on 
(/I>, and video on demimd. 

MPEG Compression 

MPE(i hi:Ls two classes of f rain es: intracoded and non- 
intracoiled fnuiu^s (see Fig. -3). Intracoded fianies* also called 
I -frames, -are compressed by rc^lucing spatial retluntlancy 
within the frame itself, I-fraines do not depend on compari- 
sons with past or '^reference*" frames. They ust^ JPEG'l>pe 
conqiression for still images r^ 
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Fig. 13. MPEG frame seq^ienciiig. 



Nonintracoderi frames are fmtlier divided iiilo P-Jmmes aiid 
B-frmnes. P-fraines are predicted frames based on compari- 
sons v^ith ail earlier reference frame (an intracoded or pre- 
du'led frame). By considering temporal redundancy in addi- 
tion to spatial rediiiulaney, P-frimies can he encoded widi 
fewer bits. B-fraines are bidirectionaily predicted frames 
that require one backw^ai"d reference frame and one fon\ ai'tl 
reference frame for prediction. A reference frame can be an 
i-frajne or a P-friwne, but not a B-frame. By detecting die 
motion of blocks bom botii a fiame that occurred earlier 
and a frame that will be played back later in the video 
sequence, B-frames can be encoded in fewer bits than I- or 
P-franies. 

Each frame is divided into macroblocks of 16 by 16 pixels 
for the pmposes of motion estimationt in MPEG compression 
antl motion cojnpensaiion in MPEG decompression. A Game 
wit ti only 1-blocks is an I-frame, whereas a P-franre has P- 
blocks OY [-blocks, and a B-iTame luis B-blocks. P-liIocks, or 
I-blocks. Por each P-block in the cun*ent frame, the block in 
the reference frame that matches it best is identified by a 
motion vector. Tlven the differences between the pixel valnes 
m the matching bkjck in the reference frame and the cinrent 
block m the curient frame are encoded by a discrete cosiiie 
transform. 

Tlie color space used is the VCbCr color representation 
rather thiin the RGB cokjr sijace, where V represents the 
luminance (or brightjiess) component, and Cb and Cr repre- 
sent the chroniinancc (or color) components. Because 
human perception is more sensitive to Imni nance than to 
chi'ominance. the Cb and Cr components ran lie subsampled 
in botli the x and y dimensions. This Jiietms that there is one 
Cb value and one Cr value for eveiy four Y values. Hence, a 
16-by-16 macroblock contahis four S-by-S blocks of Y, ai^d 
only one 8-by-8 block of Cb and one 8-by-S block of Cr val- 
ues (see Fig. 4). This is a reduction from tiie tweh e S-hy-8 
blocks ijom for eacii of the tliree color components) if Cb 

t Matfon esiEmation uses tefnporgi redundancy to esHinale the mDvement at a b^ock from 
one frame to the nflxL 



and Cr were not subsampled. The six 8-by-B blocks in each 
16-by-16 macroblock then undergo transform coding. 

Transform coding concentrates energy in the lower fre- 
quencies. Tlie transformed data values are then quantized by 
dividing by the con-esponding quantization coefficient. This 
results in discarduig some of the high -frequency values, or 
lower-frequency but low^-energy values, since these become 
zeros. Both transform coduig and quantization enable further 
compression by run-length encochng of zero values. 

Finally, the nonzero coeflicieiiis of an S-by-8 block used irt 
the fliscrete cosme transform can t>e encoded v4a v^ariable- 
length entropy encoding such as Hntfnian coding. Entropy 
encoding basically removes coding redundaiK^y by assigning 
the code words with the fewest mmiber of bits to those co- 
efficients tiiat occur niosi frequently. 



?t xi .. . 

I 

X X I X X 



XX XX XX xxixx 

0"" 

X X X X X X 

XX XX XX XXIXX 

' 

XX XX XX XX|XX 

XX XX XX XX'XX 


XXXXXXXXXX 

XX XX XX XXIXX 

; 

XX XX XX XXXX 



X X 



X X 



X X 



X X 



XX XX XX 



X X X X X X 

X X X X X X 



X X X X X X 

X X X X X X 



X X X X X X 



X X 

X X 


X X 

X X 

X X 



X X 
X X 

A 

X X 

V 

X X 



V 

X X 

X X 
X X 

A 
V 

X X 



X X|X X 


X X| X X 

XX I XX 

D , 
X X I X X 

X X I X X 

, 
X Xl X X 

X X X X 



XXXX 



X X X X X X 



X X X X X X 

XX XX XX 

D Q 

XX XX XX 

X X X X X X 

§00 

X X X X X X 

XX X X X X 



X X X X X X 



X = tumJnancfi {Y\ 

s QhrDRi inane ft \Ch, Cr) 

Fig, 4* Suhsampling of the ctirominanee components (Cb, Cr; with 
respect lo the luminance (Y) component. 
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MPEG Decompression 

_MPE(f fiet'ompre^^iioii rp^erses the fimctional steps taken 
for MPEG t*(>ai[)ression. There are six basic steps mvohed 
in MPEG decompression* 

1. The MPEG header is decoded. This gi\es information 
such as picture rate, bit rale, and image size. 

2. The video data stream is Hu^man or entropy decoded 

from variable-) eiigtl\ ccwles into fixecl-lenglh numbers. This 
step includes run-length decoding of zeros. 

3. Inverse quantizatton is performed on tbe numbers to 
restore them to their originaJ range. 

4. An inverse discrete cosine transfonii is performed on the 
S-by-S biocks in each frame. This comerts from the frequency 
domain back to the original spatial domain. This gives the 
acrua] pixel \iilues for I-blocks, but only the differences for 
each pixel for P-blocks and B-bloc*ks. 

5. Motion compensation is performed for P-blocks and B- 
blocks. The differences calculated in step 4 are added to the 
pixels in the referetice block as determined by the motion 
vector for P-blocks and lo the average of Qie forwaid and 
backvi^ard reference blocks for B-blocks. 

6. The picture is displayed by doing a color conversion from 
YCbCr coordinates to RGB color coordinates and wilting to 
the frame buffer 

Methodology 

Our philosophy was to inaprove the algoritlims and tune the 
software first, resortit^g to hardware suiDpoit only if neces- 
sary'. We set a goal of 10 to 15 framea/s for software MPEG 
\ideo decompression because this is the rate at which mo- 
litHi apt>eai's snicKith rather than jerkj'. 

We started by measuring fiie performance of the MPEG soft- 
ware we liad purchased. This software initially took tw^o 
seconds lo dutTidf mw hi Line (0.5 frame/s) on an older 
50-MHz Model 720 ^workstation. Tiris decociing was for \ideo 
only and flid not incUuie audio. Protlling indicated that the 
inverse discrete cosine transform (siej> I) trujk the largest 
chuitk i>rUie exei^ition tinie. Ibilowed by display (step 0), 
followed by nicrtion compi'Jisatitjn (step 5). Tlie decotiing of 
the MFEIG headers was Insignil'icant. 

With this data w*e set out to optimize every step hi the MPEG 
decompression software. After we aj>pl)e(! all the algoritlmi 
enliajuements antl sfiHware tuitiivg, we measured the MPEG 
decfxie software^ again. Ulijle we had achiovetl an order of 
magnitude improvement, tlie ral<^ of 4 ti> ■"> frames/s was not 
sufficient to meet oiu' goal. 

Hence, we looked at possible multimedia enhancements to 
the liasic PA-RISC" pro<"essor and other systpni-level en- 
ham emeuLs that would not only speed tjjj MrK(i decoding, 
t>ut also Ije generally useful for improving pcrfonriaru/e in 
oilier computations. In adciition, any ciiip eiihanccments we 
added could not advenit^y impact the design schedule, com- 
t>lexiiy cycietime, anfl rhif> size <jf the PA -RISC pmces.sfir 
we were targeting, thi* PA 7i(K)L(\ which was<ilready deep 
into \\H jjui>lementation {>haseat the time. Tlie PA 7100LC is 
descrihetl in detail in the article on page 12. 

We apjjroached this problem by studying the dist ributioji of 
operations exec tiled by (lie software MPEG decoder. Then. 



we foimd w^*s to reduce the execution time of the m^^si 
frequent operation sequences, Tlie application of algorithm 
enhancemenis, softw^are tmiing, and projected hartlware 
enhancements was iterated until we attained our goal of 
being able lo decompress at a rate greater than 15 frames/'s 
%ia softw^are. 

Algoritiun and So^i^are Optimiaatiotts 
In lenns of MPEG %1deo algorithms, we improved on the 
Huffman decoder, the motion compensation, and the inverse 
discrete cosine transform. A faster Huffman decoder based 
on a hybrid of table lookup and tree-based decoding is used. 
The lookup table sizes w^ere chosen to reduce cache misses. 
For motion compensation, we sped up the pixel averaging 
operations. 

For the inverse discrete cosine transform, we use a faster 
Fourier transform, which significantly reduces the number 
of mttltiplies for each t w o-dimensional S-by-S inverse db- 
crete cosine transform. In addition, we use the fact tliat the 
8-by-8 inverse transform matrices are frequently sparse to 
further reduce the multiplies and other operations required. 

The MPEG audio decompression is also done in softw-are. 
Tliis algorithm was improved by using a 32-point discrete 
cosme transform to speed up the subband filtering.'^ 

In terms of software tuning, w-e "flattened" the code to re- 
duce the number of procedmc calls and retums. ai\d the 
frequent building up and tearing dow^n of contexts present in 
the origmal MPEG code. We also did "strength redtictions" 
like reducing multiplications to simpler operations such as 
shift and add or table lookup. 

The last coliunn of Table 1 shows t he peicentage of execu- 
tion time spent m each of die six MPF,G det-onijiression steps 
after the algoritimi and softwaJT tuning improvemeuLs were 
made. Tlie fust two colunuis of Table I show the milLjons of 
insinicrions exe<-uted in c^ach of llie six decompression steps 
and the percent of tile total itvstnuiions executed (path 
length) each step represents, Tiie input video setiuence was 
iui MPEG-tvmipressed clip of a footl)aIl game. Tlie total tin^e 
taken was 7,45 seconds on an B? 9000 Model 735 99-Mnx 
PA-HISC workstation* wit h 2oi>K bytes of instniction caehp 
and 25GK bytes of data cache. 

Table I 
Instructions and Time Spent in efich MPEG DecompresstDn 

Step on an HP 9000 Model 735 

Millions of Path Length Time (%) 
Instructions (%) 
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Tlie largest slice of execution time f38.7%) and the lai'gest 
chunk of in.stniclions executcri [3B,3%) were still the inverse 
discrete cosine transform. We studied the IreQuencies of 
jieneric oi>erations in tliis group and attempted to execute 
them faster. This resulted in new PA-HISC: processor in- 
structions for acceierating multimedia software. 

PA-RISC Processor Enhancements 

The new processor iiudtimedia mstructions implemented 
in the PA 7100LC' processor allow simple anthmeiic opera- 
lions to be executed in parallel on subword data in the staji- 
dard integer data patii. In parliciilar, the integer ALU is paJli- 
tioned so that it can execute a pair of aj ithmetic operations 
m a single cycle with a single instruction, Ttu^ tuitlmictie 
operations accelerated in this way are add, subtract, aver- 
age, sliift left and add, and shift right and add. TJie latter tv^'O 
operations aie etfetlive ui implementing multiphcation by 
constants. 

PA'RISC Multimedia Extensions 1.0. The PA 7100LC PA-RISC 
processor cliip contains some instructions that operate inde- 
pendently and in paiallel on two 16-bit data fields within a 
32-bit register. Tliese operations are independent hi that bits 
cai'ried or shifted out of one of the fields never affects die 
result in the other field. Tliese operatioiis occm' in parallel in 
tliat a single instruction computes both 16-bit fields of the 
result. Table II summarizes these instiuclions. 

HADD does two parallel lt>bii additions on the left and the 
right halves of registers ra and rb, [ila<"iMg I he tW'O IG-bit re- 
sults into the left and right haJv^es of register rt. 

HSUB does two parallel 16-hir subtractions on the left md 
rigiu halves of registers ra and rb, placing the two lt>faitii&- 
suits into the left and right half of register rl. 

Both HAOD and HSUB perform modulo arithmetic (modulus 
2**^), that is, the result wrzps iiroiuid from tlic laigest number 
back to the smallest number and vice versa. This is the usual 
mode of Di)eration of tivos complement adders when over- 
flow is ignored, 

HADD and HSLfB also have rvt^o saturation arithmetic options, 
Witli tlie signed satiuution option. HADD ss, both rjperands 
and tiie result are considered signed h>t)i1 integers. If the 
result cannot be represented as a signed 16-bit hiteger, it is 
clipped to the laigest positive value f2'''-I) if positive over- 
flow occms, or ii is clipped to the smallest negative value 
(-2^^) if negative overflow^ occurs. 

With the unsigned satm-ation o|>tion, HADO.us, the fu-st oper- 
and (ra) is considered an unsigned 16-tjit integer, die second 
operand (rb) is considered a signed Ifi-bit integer, and the 
result (m rt ) is considered an misigned h>i)it integer. If the 
result caimot be reinesented as an misigned 16-bit integer, it 
is clipped to the laigest imsigixed value (2'^^-l) if positive 
oveiilow occurs, or it is clipped to the smallest misigned 
value (0) if negative overflow^ occm^s. 

The signed satiu-ation and unsigned satmation options for 
parallel halfword subtraction are defined similarly. 

HAVE, or halfw^ord average, gives the average of each pmr of 
haII\sords in ra and rb. It takes the sum of parallel hallwords 
and does a right shift of one bit before storing each 16-bit 
rejsull mto rt. During die one-bit right sliift, die carr>- is 



Tabrt VI 

PA'RISC Multimedia instructions in PA 7100LC 

ra contains a1; aZ 

rb contains b1: bZ 

rt contains t1;t2 



Instruction 

HADD ra,rb,rt 

HADD.ss ra,rb,rt 



HADD.ysra,rb,rt 



HSUB ra,rb,rt 
HSUB,s£ra/b,rt 



HSUB. LIS ra/b,rt 



HAVEra,rb,rt 



HSUADD ra,k,rfa,r! 



HSRkADDra,k,rb,n 



Parallel Operation 

tl -[a1+bl)mDd2^^; 
t2-(a24-bZ)mod2^S; 

tl:^[Fjst+b1) > (2^S-1)THENfZ^^-l) 
ELSEIFIaUbl) < -2i^THEN [-2^^! 

ELSE(a1+b1); 
t2=!F{a2+b2l > {2^5-liTHEN (2^^-l) 
ELSElFia2+b2) < -2^^ THEN (-2^^} 
ELSE !32+b2); 

tU[F|al+hn > (2^M)TH£N{ZlM) 

ELSEIF(aUbl) < OTHENO 
ElSE(aUbl); 

t2=IF(a2+b2) > (21^-1) THEN j2^^-l) 
ELSEJFJ32+h2) < OTHENO 
ELSE (a2+b2); 

tl =(al-b1|mod2^S; 
t2 = (a2-b2|mod2^^; 

tl-IFisl-bl) > 121 Ml THEN (21^-1} 

ELSEfF{a1-bl) < -Z^SyHEN (-2^^! 
ELSEjal-hl}; 
t2=IF(a2~b2) > i2^^-1)THEN i2^&-l) 
ELSE!F(a2-b2) < -Z^^ THEM (-2^^) 
ELSE[a2-b2); 

tUfF(al-bl) > (2^6-1) THEN (Z^M) 

£LSE!F{a1-b1) < OTHEISlQ 

ELSE(3l-b1); 
E2=lF{a2-b2| > (2'^) THEN 121^-1) 

ELSE:lF(a2-b2) < OTHENO 

ELSE[a2-b2); 

tl=(al+bl}/2; 
tZ ^ (a2+b2)/2; 

tl =(a1<Ck)+b1; 

t2 = (a2ck)+b2; 
[fork^ 12, or 3) 

tl=la1>k) + bl; 
t2 - (a2>k> + b2; 

If or k= 1,2, or 3) 



ss = signed satoraLion opxian 
us - unsigned saturation 

sltiftrd in on the left aiic! unt)iase(l romiditig^ is i)erfornied on 
the Icast-sigiiificant tiit on the rigiii. Because the carry is 
sliifted in. no oveiflow can occur in the HAVE instruction. 

HSLkADDj or halfv^^ord stiift left and add» allows one operand 
to be shifted left by k bits (where k is 1 , 2. or 3) before beiaig 
added to tlie other operand. 

HSRkADD, or huHwortl sliift right and add, allows one operand 
to be shifted righi l>y k bit.s f where k is 1, 2, or 3), before 
being added to the other operand. 

Both HSLKADD and HSRKADD luse signed saturation. 

• UnbLased curding means that the net difference between tfre ti^je avef ages and ThB averages 
obrgirted ater unbiased rounding is zefo if the iBSute are equally dWbified «n the result mnga. 
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Satufatmn Arithmetic, In saturation anthmelle a result is said 
to hii\'e a positive overflow if it is larger thm\ the hirgest 
value ill the define<:i range of the result, it is said in ha\"e a 
negath-e overflow if it is smaller tlian ilie sinallest value m 
the defined range of the result. If the sattiration option is 
tised for the HADD and HSUB instructions, tlie result Is cUpped 
to the maxiniuju \ahie in its defined range if positive over- 
flow occiini aiid to the miiunium value in its defined range if 
negative overflow occurs. This further speeds up the pro- 
cessing because it replaces using ahout ten iiistni(*tions to 
check for positive and negat he overflows and performs the 
desired clipping of the result for a pair of operations in one 
Instnicnioit 

Saturation arithmetic is highly desirable in dealing \^ith 
pixel values, wMch often represent hues or color intensities. 
It is imdesirabJe to perform the normal modulo arithmetic in 
which overflows wTap arounti fiom the largest value to the 
smallest \ aliie and vice \ ersa For example, in B-bil pixels, if 
I) represents black jind 255 represents white, a result of 256 
should not change a white pixel into a black one» as would 
occur with modulo arithmetic. In satui'ation arithmetic, a 
result of 256 would be ciippt*t! to 255. 

Effect on MPEG Decoding. These pai"allel sub word arithmetic 
npt^ rat ions signiHciUitly sjieed up several critical parts of the 
MPE(j decoder program, especially in the itiverse discrete 
cosine transform and motion compensation steps. More 
r ban half of the instmctions executed for the inverse trans- 
form step are these parallel subword atithmetlc instruc- 
tions. Theu- implemcjitalioii does riot impact the processor's 
cycle time, and ackis less tlitin Q.2% of silicon aica to the PA 
TIOOLC process<:)r chip. Actually, tjie area used was mostly 
empty space arourifi the AH ■, so that these muUiniedia en- 
hancements can be said to have contrii)utc<l to jnore effi- 
cient area utillzatioti, rather than addinj^ IncreTuental chip 
area See "Overview of the Implementation tjf the Miilti- 
media Enhancements'' on page 60. 

Since the PA 71CM)LC processor has two integer ALUs» we 
essentially have a |>arallelism of four Inilfword operatitms 
t>er cycle. Tills gives a speednjj of four times, in plac(\s 
where the superscalar AbUs can be used in parallef Be- 
cause of the biiilt-iit saturation arithmetic option. si>eedup cjf 
certain i>ieces of code is eveti greater. 

System Optimizatian 

Tlie second longest fimctional step (see Table I) m MPEG 
decompression was the display step. Mere, we leveraged the 
graphics subsystem to imt>lemem die color Cf inversion stcjj 
togetiier with the color recovery already being ilone in the 
^rapltics chii>." CJolor conversion ctuiverts between color 
representations in the YCbCr color space and the RGB color 
space. Color recovciy rc^jjrockjces 24-1 )it R(jB cr)lor that has 
been color cotupressed into 8 \ii\s before lieing displayed. 
Color compression allows the use of 8-bit fnune liuffers in 
low-cost workstations to achie\"e almost the color dyucunics 
of 24-bit frame liuffers. This leveraging of low- level pixel 
manipiikitions close to the frame buffer ljei\ve**n the graph- 
ics and video slreanis also contributed signibctiiLtly lo \\\v 
attainment of real-time MPEG decontpression. Color recov- 
ery and the ^ra|)hics chiii are des^nhed in the ariicles on 
pages 51 and 13, restJi clivi-ly. 



Other PA TltXJLC' processor enhancements streamline the 
menuJr\-l<^L/f> path. By ha\1ng Ihe memory controller and 
the I'O inlerfac^e controller in tegratetl in Llie R4 7100LC chip, 
overhead in the niemor^^-to-frame4>uffer band'width is re^ 
duced- Oi'erhead in the processor-to-graphics-controller- 
chip i^)ath is also reduced for both control and data 

Path Length Reduction 

Table III shows the same information iis Table i but for the 
low-end Model 712 workstation which uses the mukiinedla- 
enhanced PA 7100LC processor aitd the graphics cliip 
ntentioned above. 

Table III 

Instryctians and Time Spent in each MPEG Decompression 
Step on a Model 712 Workstation 

Millions of Path Length Time (%) 
Instructions 



Header decode 


0-60 


Huffruan decode 


55,0 


Inverse quantization 


8.9 


lin-erse discrete 


138.5 


cosine transform 




Motion 


74.8 


compensation 




Display 


63,0 


Total 


;340.S 



(%1 




0.2 


0.3 


lai 


14.5 


2,6 


4.5 


40.6 


;i4.4 



21J 

18,5 
100.0 



25.6 

2(h7 
100.0 



The Model 7)2 executes consistently fewer instruct ions than 
the Model 7:J5 for the same MPECi decompression of I lie 
same video clip. It is also taster in MPEG deroEupressioti 
even though it operates at only QMf of the 99-MHz mte of the 
hl^h-*nid Moflel 735 and has only one fighlh of t lie r ache 
si'^e. Tills shows the perfonnance heru^riLs frcau the path 
leriglli reducfiotr enabled by the PARISC; ] uoressor and sys- 
tem eti)ianc*mieuts for mnlliiut^dia acceleration. 

Perforniaiice 

The iH'rloniuuue of the PA-RISC archil cctnra! enluuice' 
ments and the leveraging of the graphics subsystetn for 
video decompression ran he set^n in Fig. 5, This data is for a 
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Fig* 5. Ma>iin!ijni MPKU tU'Curlt-' rraiiif raTi->i Utr riiitV-hnil itiodels of 
|{[^ IHHin Si*Hf\s 7tm wnrk^tatinns, The.se rate.s are for a 'J52-hy'24()' 
\i[\['] I lip llifit was piicfKiwI at :iO fraiiiea/s. 





)Copr. 1949-1998 Hewlett-Packard Co. 



April KIEir>lli«wk'U PiU'kajdJcjiim.'il 65 



Overview of the Implementation of the PA 7100LC Multimedia Enhancements 



One goaf in adding Ihe multimedia mstrucTions was to n^inimjze the amount of 
new circuits to be added to the existing ALUs and to minimize the impact on the 
rest of the CPU, llifs goal was accOTpiished Die only circuit changes to the CPU 
were in the ALU rfata path and decoder circuits. These instructions muse most of 
the existing funciiunaliEy and very small modifications and additions were re- 
quired to implement them. 

AH of the new mstructions implemented require two 16- bit atfds or subtracts lo be 
done in parallel "Rie existing ALLf adder was modified to provrde this functionality. 
These instructions required that the existing 32-bit adder be conditEonallv splft 
mto two 1 6"bit halves without sacnficing the performance of the 32-bit add, Con- 
ceptuallv Ihis is equivalent to blocking the carry from bit 16 to hit 15 m a ripple- 
.carry adder To accomplish this, we made the following modifications. 

The ALU adder is sirraiiar to a carry lookahead adder, The first sta^e of the adder 
calculates a carry generate and a carry propagate signal for each single bit m the 
adder In this case, 32 single-hii generate and 32 single-bit propagate signals are 
calculated These smgle-bit carry generate and carry propagate signals are used in 
subsequent stages of the carry chain to calculate carry generate and carry propa- 
.gate signals for groups of bits,. 

The 32-bit adder was divided mto two 16-bit halves betweec^ bits 1 5 and 16 by 
pro\^idmg alternate signals for the carry generate and carry propagate signals from 
bft 16 (Fig. 1 1 Tfie new generate and propagate signals from bit 16 are created 
with a two'inpijt multiplexer. When a 32-bit addition or subtraction is heing per- 
formed, the multiplexer selects the original generate and propagate signals to be 
passed onto the nejtt stage of the carry chain, When 16-hiL addition or subtraction 
is being performed the multiplexer selects the value fur generate and propagate 
from the second input which is false {logical 0) for additions and true (logical 1} for 
subtractions. 

The new generate and propagate sfgnals can be forced to be false for instructions 
requiring halfword addition. This stops the carry from be mg generated by bit 16 or 
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propagating from bit 16 to bit 15, even if this generate and propagate signal is not 
used directly to calculate the carry signal (as is the case in this add erf. Hi b gener- 
ate and propagate signals can also be forced to be true for instructions requiring 
halfward subtraction. This will force a carry rnto the tnore significant halfwofd of 
the adder by generating a carry from fart 16 into bit 15 This technique is. used 
along with the ones complement of the operand to be subtracted to perform sub- 
traction as twos complement addition. 

The original carry generate and propagate signals from bit IB are still generated to 
calculate overflows from the less sign iff cam haifward addition. This overflow is 
used by the saturation logic, which can be invoked by same of these instructions. 

Saturation requires groups of hits of the result to he forced to states of true or 
false, or passed unchanged. This is accomplished with an AND-or gate [Fig, 2). 
The AND function can force the output of the gate to he false and the OR function 
can force the output of the gate to be true. Thus, the output is either forced high, 
forced Sow, or forced nejiher high nor low. It is never simultaneously forced high 
and low. The key is to determine when to force the resuh to a saturated value. 

The saturation circuit is added at the end of thn ALU's data padn after the result 
selection multiplexer selects one of the results from the adder after it performs 
additions, subtractions, or logtcal operations such as bftwise AND, OR, or XOR 
(Fig. 3}. The saiuratinn circuit does not impact the critical speed paths of the ALU 
because it is downstream from the point where the cache data address is driven 
from the adder and where the test condition logic (i.e.. logic for conditional branch 
instructions) obtains the results from which to calculate a test condition. 

If signed saturation is selected, the ALU will force any IB-bit result tfiat is larger 
than D>7fff to OxTfff iZ'^-1 ) and any 1 6-bit result that is smaller than OxBOOO to 
0x81300 i-2'^J These conditions represent positive and r^egative overflow of 
signed numbers. Positive and negative overflow can be detected by examining the 
sign bit (the MSBf of each operand and the resuh of the add. If both operands are 
positive and the result is negative then a posftive overflow has occurred and the 
result m this case is saturated by forcing the most-significant bit to a logical Q and 
the rest of the bits to a logical 1 . If both operands are negative and the result is 
positive then a negative overflow has occun"ed and the result in this case is satu- 
rated by forcing the most significant bjt to a logical \ and the rest of the bits to a 
logical G, Unsigned saturation is implenierd:ed In a similar way. 

TTie average mstruction. HAVE, regurres manipulating the result after the addition 
is finished. Before the implementation of the halfword instructions the ALU se- 
lected betwe-en the results of a bitwise AND, a bitwise DR. a bitwise xOR. or the 
sum of the two input operands. The halfword average instruction adds an addi- 
tional choice. The average result is the sum of the two input operands shifted 
fight one bit position with a carry out of the most-significant bit (MSB) becoming 
the MSB of the result. To perform rounding of the result, the least-significant bit 
(LSBI of the result is replaced by an OR of the two leasi-significant bits before 
shafting right one bit. 

The shift right and add arKl the shift left and add fur>ctions were added by modifying 
the X'bus preshifter in the operand selection logic of the ALU. The original ALU was 
capable of shifting 3Z-bit inputs left by zero, one. two, or diree bits. To implement 
the IE-bit shift left and add instructions, the left-shift circuits had to be broken at 



66 ApriJ I9t)5 [Ipwlptt-Parkaril .loiima] 



)Copr. 1949-1998 Hewlett-Packard Co. 



Gefierst Registirf 


■ 


General Register 



Berkeie^y Sftftware 
WTthEiiit Hardwafc and 
Software Enhancefneitts 



Berfceley Software 



Oi^rancf 1 Selection 
and Preshrfter 



pe ra nd t| Se I eel ion 
and One's Complent^nl 




32 8tt Partitioned 
Adder 



■► Cache 



j^ Address 





^^1 






Geiversl Register 



Fig, 3. Flow gt halfword rristriictiGrfS showing tha Jocation of Ihe saturatjon logjc in mlatjon to 
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the halfword boundary. This was done by mulnq the btts Mixed frorn the [east- 
significant halfword to the mDst'Signiti[:ant haifword with a consrpi signal that 
indicates when a 32-bit shift is being done. The 1 B-bn sbift nghl and add instruc- 
IJDns were implemenied bv adding the ability to shift one. two, or thret bits right 
This shift is always broken at the halfword boundary, 

One chatlenginfl asped of implementing the IE-bit shift left and add fnsirutitions 
was detecting wbefi the resutts of shifting an operand left by one, two, or threa 
hits causes a pasitive or negative overflow. A positive overflow occurs when the 
unshdted operand is positive and a logical nne is shifted out of the left, or when 
the fesult of the shift is negative A negative overflow occurs when the unsJiifted 
operand is negative and a logical zero bit is shifted out of the lett, m when the 
result of the shift is positive. These overflow conditions are combined with the 
overflows calculated by the adder and used to saiufaie the final result, The final 
result is saturated if either the left shift oi the adder causes an o^/erflow 

The result Df selecting instructions that can provide the most useful funirtionality 
while costing the least to implement was a reJatively small increase in the area of 
the ALU, About 15% of the ALU's area is devoted lo halfword instructions. Since 
the ALU's circuits were the only ones modified on the processor cNp. only about 
2% of the total processors chip area is devoted to halfvrafd instructions 



viden (\l[> Ih^t was compressed at 30 rraiites/s. The M oriel 
71") aiiil Mi>dpl 735 aie based on the PA 71UU pixK'esstin The 
Model 712 is based ot^ the PA TIDOLC processor wtikh is a 
derivative (jf the PA 7100. Tlie PA 7100L(' anitains Lhe tiiulti- 
media enhajitements and system integral if»ii ieatiires and is 
<lt'seriljeri in (lie ailicle <in jUige 12. The riUler, high-etid 
Mrttlel 7:ir> niniiing at E»Ji Mil/, aeliieves 18>7 frarneH/s while 
the iiewrer eiitr>'-level Model 712 aclueves 2ii Traiues/s at 60 
MUz anci Ski franies/s at HO MIlz. 'niese rraiiie decxjmpres- 
sion rates are quoted for MPEG video only (no audio) with 
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Fig, 6* Comparison beiwetin the perfonnanee at the enlianced 
torkc'ley MPEG decoder and the HP MPEG decoder (without audio). 

no constraints on how fast the decoding can proceed In 
other words, the decodhig rate is not constrained by the rate 
at w^Mch the MPEir stream tias been c^mipressed. Hence, 
although the %icleo c"hp nsed was MPEG compressed at 30 
fianies/s, tlie 80-MHz Model 712 caJi decode it faster tlian 30 
firaines/s ill imconstimiied mode. Tins implies that there Is 
sfmie processor banflwidth left after achieving real-time 
software MPEG \ itleo decoding. 

hi the \ddeo player product hi HP MPower 2.0, frames are 
sMpped if the de<'oder cannot kee]D np with the desired real- 
time rate, T\\\s results in a lower effective frame rate, since 
skipped fmnies are not counted, even thotigh execution time 
may have been ttsed for partial decoding of a skipped frame. 

Fig. 6 show^s a cf imparl sf hi between the enhanced Berkeley 
software MPE(i de<(>der mid the MPsottwaje MPEti decoder 
running on Ihe older HP JKiOO Model 720 (with no liarf!ware 
nmlliniedia enharvcenientsj ^md lhe newx^r Model 712 work- 
station ( with hardware multimedia enhancements). The 
fonrlh cohimn in Rg. 6 illustrates Iheperfontiance obtain- 
aliie with sytiergistie softwiiri- and hardwiue erilianeements. 

In the Model 720, the Berkeley and IIP software decodei-s 
have coTtiparalile pcTforr trance. Porthe Model 712. Ihe i)er- 
fo nuance of lhe HP (lecoder was 2. 1 times greater lliiUi the 
Berkeley decoder because of the synergistic coupling of the 
algotithnis and software optimized \'intb the P.\-R1S('' niuiti- 
metha instructions and the system-level enliancetnents m 
the Mode! 712, 

Fig. 7 shows the perfomiance wlien MPEfi atidio of varicHis 
tidehty levels is also deconipresseti by software nmnhig on 
the general-puri>ose PA 7100LC processor The highest-fidel- 
ity audio is stereo with no def^imation. Tliis me^ms that 
every audio sumph' i omes i\s a pair of left atui right f*hajinel 
vahies, imd every satiiple is irsed. Half decimation means 
that one out of every two audio samples is used. [;i/4 deci- 
mation means that only one out of every four audio samples 
is useil.) M(jncj means that evi^ry audio sample is a single 
ViUtie (thajmel) rather than a iiair tvt valites* 

While software decompression of MPEG audio degrades die 
l>erfoiTiianee in lenns oT Iramt^s fk^coded j)er second, the PA 
7P)(JLC-based workstalitjtvs achit^ved rates of 16.1 IVaiues/s 
at BO MHz, 24-2 tramest/s at 80 MHz, and 27.4 frames/s at 



)Copr. 1949-1998 Hewlett-Packard Co. 



A| J ri I I Bi 5 tl i* w IM l - f ^ir- kjm I .U uirnal 6 7 



35 T 



2S,3 3^* 




Steieo Mi>na with 1/2 Oecim^ti^n W^ ^^ Auilio 



Fig, 7» Pertbrmaiicp when MPE(i \ide<i aiid A'U^E(i audii) are decoded 
ill software. 

iOO MHz even with the highest-fidelity 44.1-kHz stereo i6-bit 
linear audio format with no decimation. With fuitlier en- 
hancements c>f audio decociing and audio-video synchroniza- 
tion, we slKJiild be able to do even better. 

ConcluE^ion 

We wanted a softwai'e approach to MPEG decodmg because 
we felt that if \ideo is to be useful it has to be pervasive, atid 
to be pen^asive, it should exist at tlie lowest incremental 
cosi on all |)latf€unis. With a softW'are video decoder, tliere is 
essentially no atidilional cost. In udcUtion, the evolving staii- 
daixLs and improving algorithms pointed (o a flexible solu- 
tion, like software running on a general -i>ur|)Ose processor- 
Using special-puipose chips designed for MPKCj decoding ^ 
or even for JPEG, MPEG, and H.261 compression and de- 
compression, would not allow one to take advantage of im- 
proved algoritlnns and adapt to evolvTtig standards without 
buying and instiiUing new hardware, 

Fiufthennore, since tlie performance of genetal-puq^ose 
microprocessors continues to improve with each new gen- 
eration ^ we wanted to be able to leverage these improve- 
ments for nailtimedia conipntations such as vitleo decom- 
pression. This approach ^iJso allows us lo focus liardwaie 
design effoits on improving the perforniance (jf I lie geru^ral- 
pmpose processor and system without having to leplicate 
perfoiiaance efforts in each special-purpose subsystem, 
such as the graphics and video subsystems. The PA-RISC 
nmltin^edia inslnictiuns are also useful for graphics, image, 
and audio com]:)u tat ions, or any t omputations reiiiiiring 
aritiimetic on a lot of ivumbers with precision less ditui 16 
bits. 

The net result is that we achieve real-time MPEG decoding 
of video streams at 30 frames/s with a softw^are decoder. 
Tliis was acliieved by a synergistic combination of algorithm 
enhancements, software tmiing, PA-RISC ]jiocessor multi- 
media enhancements^ combining video and graphics support 
for color conversions and color conipression, and system 
tuning. Tlie PA-RISC multimedia enhancements allow paral- 
lel processmg of pixels m the stand;url integer data path at 
an insignificmit adcUtion to the sihcon area, Tlie total area 
used is less than 0.29'u of the PA 7100LC processor chip v^ith 
no impact on the cycle time or the control complexity 

The real-time soft^'are MPEG decoding rate of the final 
video player product exceeds our original goal of 10 to 15 



frames/s for a software-based MPEG video decoder. It is 
also significant that MPEG video dectK^ing at 30 frames/s is 
achieved by an enti-y-level ratlier rhaji a )iigh-end work- 
station. This is in the context of a full-runction video player 
on the \W MPower 2.0 product. With MPKtJ audio decoding 
{also done by softwaie). the fiame rate is nsnally above 15 
franies/s, even for the low-end Model 712/60 workstation, 
and around 24 frames/s for the Model 712/80 workstation. 

We expect to see coutiiiuous improvement in the MPEG 
decoding rate as the performance of the general-purpose 
processors increases. With PA-RISC processors^ there has 
been roughly a doubling of perfonnaiice every 18 to 24 
UKJUths. Tills would imply that lai'ger frames sizes, niultiple 
vitieo St reams J or MP EG -2 streams may be decoded in the 
future by such nntltimedia-enhanced general-purpose 
processors. 
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HP TeleShare: Integrating Telephone 
Capabilities on a Computer 
Workstation 



Using off'the-shelf parts and a special interface ASIC, an I/O card was 
developed that provides voice, fax, and data transfer via a telephone line 
for the HP 9000 Model 712 workstation. 

by S, Paul lacker 



Intogration of the telephone and the computer workstation 
is a nati.n'iil siep in the evolution of ilie electi"omc office. It 
allows die user to perfonn telepiione transiictions without 
having to change from tiie keyboard and mouse environ- 
Hient to the telephone and handset environment and viee- 
V'ersa This capabilit>^ provides obvious benefits to a wide 
ciisionier audience, especially those dealing wilh riistonier 
senic^e aiul sutJiJon . The HP TeleShare option card for the 
HP 9(KH) Motlel 712 workstation represents HP's tirsi hite- 
graled telephony product. Coupled v^ith multimedia teclmnl- 
ogies such as audio, video, imd HP SharedX, ' HP TeleShare 
provides the user witli a powerful aiseual of communica- 
tions tools. Ttiis article will focus mainly on the hardware 
aspects of the HP TeleShare product. 

Features provided by IW TeleShare include: 

• Tw,^o-line support, witli each Une configurable for voice, fax, 
nr data 

• Workstation audio support and mixing f stereo headset with 
built-in microphone included) 

• Dujril-toi\e nmltifreqnency (DTMF) tone generation and 
detection 

• Tf4eplu>m^ liru* status and control 

• Call prngress support 

• Caller-ID support 

• V.:]2bis modem (14,4tJ() bit,s/s) with \U2bisand MNF5 
(Microfom Netm^urking Protfjcol) ccmTpression and error 
correction 

• P'ax (troup :J Class 11 up to i4,40iJ bits/s. 

Background 

HP llieShai'p tvegan as an experimental interface card for 
the HP SK)Od Series :}0() workstations. It had simple voice- 
only lele[*hone capabihties, ijicluding single-source a uilio 
record and i)layback, and it was perceived a.s usefnl (and 
enieitaining) to those enguieers who were fnjiiinait^ enottgli 
to have the opportimity Icj use it. At sonu* point, fuilher ii^ 
vestigation was needed and the HP TeleShare project team 
was formed, h was determine*! that fax and data modem 
capal Jill ties were needed with clns4' coupliii.^ lo the work- 
stations aufiio capabilify. Dual iek^phoru* liiies were in<'lti<k*d 
so tJne user could laJk on one hne and at the saine titite use 
the otiier line for faxinj^ or data. The sl^Hidard autUog [ihone 
line interface was chosen over digital (i.e., ISDN) because of 



the relatively insignificant nimiber of digit allj' equipj^ed PBX 
systenis. 

The first incarnation of the current product was an external 
RS-232-driven box \vith stereo inputs for computer line-in, 
and microphone audio, and stereo outputs for coinputert 
line-out, and headphones> It provided dual-lii^e operation 
and employed two DSP (digital signal processor) subsys- 
tems for maximnm flexibility and perfonnance in voice and 
data n lodes. Audio mixing was provided by dedicated analog 
hardware, and any combination of audio inputs could be 
sent to any output complete with treble and hjiss control* 
The audio capabilities were so good thiU MP TeleShare engi- 
neei"s always haci their CD players plugginl into tiie Ijon and 
ttieir headphnnes on. lliis forced df^velopmeni of ;ui auto- 
matic autUo mute feature when an incoming call wiLS de- 
tected, Smce modem fimctionahty was a fjrimaiy goal, the 
conmumd interface for the box used a partial modem AT 
conim^ind set , along witii some t>roprietat>' exteiLsions for 
new ftmctionaliiy like sejtii^g audio gains, setting audio mix 
values, telling a DSP lt> reijoot, inu] so oi>. Tliese ccmunands 
v\'ere delivered over the RS-232 iiitiTface and received the 
typical OK and error responses. 

While tlte extenial l)ox wjls we^ll received in tlie lab ajid l>y 
aistomers, il \vasi>ostponed in<leriniiely in favor of a lower- 
cost internal vemion with a pi'opriet^uy interface available on 
a single workstation » the Model 712, The sante DSP subsys- 
tem usetl in tlie external box was carried over to the Model 
712 option card kuul some elTort was exerted to leverage as 
much as possible of the extenial box's softwaie interface 
imd feature set into the new design. 

Architect II re 

HP TeleShare is made np of two indepentlent DSPsHhsys- 
terns tlmt cfmimunicale witli the workstation liost through 
^m interface chip caUetl A7M^ (see Pig, 1). Each fJSP is 
coupk^d to a hybrici (*liip called a data access armtigemeni, 
v\hich provides direct comu'ction to a. standard two-wire 
analog telephone line, HP TeleShare is tightly coupled wit h 
the woikstation audio systetti Iq t)rovide the higliest d("gree 
of audio tlt^xibility. For instimee. line-in aiulio (perhaps from 
a VD pbiyt'i] could be sen! tfi telepjiotie line i) while ihe 
paiiy on Ihni line is on hoki. SimnthuuHHislyi the workstation 
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user could be conversing with or faxing a message to the 
party on telephone Uiie 1. During this time cominetits from 
the party on line could be recorded to disk for later 
playback. 

XBAR. Tlip XBAH ABIC { application-specific integrated 
circuit] is a custom \T^SI part packaged in a low -cost 80-pin 
QFP (quad flat pack). It was designed by the HP TeleShare 
team specifically for use witl^ the Model 712 workstatioti 
and performs all of the interface functions required by the 
HP TeleShiiU'e card. Tlie XBAR chip conmiiuiicates with 
the sy.stem 1/0 chip (LASI) and the audio CODEC (coder/ 
decoder) through a pair of proprietary serial interfaces. K 
HP TeleShare Is not presen! in the system, audio data and 
CODEC control words pass clirougii the bidirectional buffer 
between the system I/O chip and the audio CODEC. Wlien 
the HP TeleShare card is installed, XBAR is effecti\'ely 
placed between the system 1/0 cliip and the CODEC, forriiig 
all audio to be routed through XBAR in cither direction. The 
serial interiace between XBAH and the audio CODEC carries 
16-bit stereo audio data. 

The serial interface between XBAH and the I/O chip muiti- 
plexes 1 6-bit system audio data (to and ftom disk) and r on- 
trot words for XBAR. hi addition, this interface Is used for 
modem data, voice-mode AT conmiands and responses, and 
DSP application code doi\Tiioaded from the host system. On 



the DSP side, XBAR has two 13.S24-MIlz serial ports, each 
designed specifically for interfacing with the DSPs. These 
ports are used for passing atidio samples, modem data, ap- 
plication programs, aud coiutnands an<l responses to aurl 
from the DSPs, 

XBAR s configuration can be changed by writing to the 
control ! egisters in the XBAJi-toLASI I/O serial mterface 
address space, hi voice motle, XB.^K is cordlgmed to pass 
eacli audio data sample from the L\S1 I/O ciiip and CODEC 
(coder/decoder) to a DSP, whereupot^ the DSP will return 
responses to each of those sources. XBAR can also send 
audio data samt^les from DSP to DSP for conferencuig be- 
tween lines when both Ihies are in voice mocie. In data and 
fax modes, XBi\R sends appropriately formatted da! a to the 
DSP and receives sintilar data in retimi. 

Although XBAR sutipoits stereo audio at up to a 48-kHz 
sample rate, DSP band\iidtii hmitations require all audio 
data to and from the telephone lines to be left-chinmel only^ 
sampled at 8 kHz. This is not a seiloiis Ihnitatiou, since tele- 
phone-quality audio only requires a sample rate around 7.2 
kHz for fuH reproduction and is inherently a single-channel 
signal. 

In addition to the DSP serial iJortSt XBAR also htis a pair of 
Ijyte-mde parallel ports that cotutecl to the DSPs' boot ROM 
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ports. This allows DSP boot code (not to be conJiised with 
DSP application codel to be do^vnloaded from the host s>'S- 
tenu This provides additional flexibility and elimiiiates the 
cost ^id board-space limitations associated with extemaJ 
ROMs, 

XBAR has several asynchronous control signals that are 
coanecied to dowiistream hardware, including reset lines 
for the DSP subs^'stems and hook controi and teiephone 
status (such as the ring indkraior) signals to and from I he 
data access armngenient chips. 

The bigg^t challenge in XBAE's design was purely logistical 
because we had a lot of different dara t>pes (e.g., siteneo data 
and telephone command and status data) to handle and %'er>' 
little time to implement them in XBAR. There are no less 
than 52 separate data types tliat XBAR must recognize and 
generate for tiie tw' o DSP serial interfaces alone, \^ith a 
slightly smaller number required for the s%^tem I/O interface. 
To provide these types, each transaction between XBAR and 
a DSP consists of 16 bits, with the upper eight bits providing 
tyi^e information and the lower eight bits providing the tisso 
ciated data. To prevent data ovemnis, XBAR requires a daUi 
acknowledge (Ack) word back from rhe appropriate DSP for 
every irtnt^action. 

Audio data samples in HP TeleShare are 16 bits long. Since 
.XBAR sends eight bits of data at a time, audio samples must 
be broken into two pieces: an upper half, or most -significant 
byte (MSB) and a lower half, or leas t-sign ill cant byte (LSB). 
Using this model, it requires two traixsfers from XBAR and 
two AcLs hack to send one sample of audio data to a DSP. 
Sending one set of stereo system and line-in audio samples 
to a DSP requhes eight output transfers (four transfers for 
the system sample and fouj' transfers for the linc-hi simiple), 
with an Ack hack after each trtmsfcr. Tlie DSP will then send 
mixed audif* samples hack for the system and tlie CODEC, 
requiring an additional eight transfers, for a total of 24 trans- 
fers jjer sample. Tliis has to happen at an 8-kIIz sample rate 
(once every 125 microseconds). Fortunately, XBAH ciui han- 
dle lliese transactions, but order must lie maintained exactly 
or aiiditj qutility i^ill suffer. Other data tyijes, such as AT 
comntands and respt^nses, are given lower (Priori ty during 
audio rrame.s and are queued until audio transfer is finished. 

Digital Signal Processor. Tlie DSP used by TeleShare is an 
Ajifdug Devices ADSP21I)1. Thin is a progranmiable singlt^ 
rliip mit.Tocomputer optimized for digital signed processing, 
and operates at 16.67 MIIz. Tl\e 2101 operates on 16-bit tlata 
iuiil uses a 24-bit instmction word. It has 1024 words of data 
RAM and 2048 words of program RAM (m the chip. Tlie pari 
has two data address generators and a program sequencer^ 
wliich allows program anfl data accesses lo occur wirtniUa- 
neously in a single cycle. Dutil data operand fetches Ciin also 
occur in a single cycle since program memory can also be 
used to store data. The pari can address U(i kj ir>K wfirds 
of data and lOK wtird^s of prograni memuiyt both r>f which 
are supplied on the HP TeleShare board in the fonn of six 
external SKAMs, 

Tlie DSP has two indetjendcnt serial ]H>rts, SPORTO and 
SPORTh wliich suppott multiple data fonnats and frame rates 
and are fully prngiammable. hi the IIP Tt-li^Share design^ 
SPDRT1 rju each 1>SP is dedicated to Ctnumunication with 



XBAR, while SPORTO is dedicated to conimuitication with an 
AD^SmspOl telephone line CODEC. Each fiiU transfer to or 
frani one of the SPORT Unes triggers an associated interrupt 
in the 2101, allowing programs to act on the incoming data 
as it arrives. 

Analog Devices supplies a complete set of software develop- 
ment tools for the 2100 microprocessor family, including a C 
compiler mth a DSP function library 

CODEC. The Analog Devices AD28msp01 provides HP Tele- 
Share with a multiple-sample-rate CODEC speciBcaliy de- 
signed for use in modem designs. This device supports sam- 
ple rates of 7.2 kHz, 8.0 kHz, and 9.6 kHz, and has an 8/7 
mode* for samphng at 8.23 kHz. 9.14 kHz and 10.37 kHz. For 
voice mode operation, 7,2 kHz and 8.0 kHz are all that is 
required, but the sophisticated algorithms used by modern 
modem standards often require the other rates. The CODEC 
uses 16-bit sip^na-deha conversion teiihnology aiid includes 
resampling and Lnterpoiatton filtering along with rnmsrait 
and receive phase adjustments. 

Each CODEC has one serial port wliich is connected directly 
to SPORTO on the associated DSR Tliis port operates in Iree- 
running mode once it is properly initialized and continually 
sends 16-bit data samples front the telephone line to the 
DSR .;\li irimsfers to the DSP consist of a serial output frame 
sync follovved hy a H>bil address word, then a second frame 
sync followed by a ir>-bit data word. Tiiese address and data 
pairs are transmitted at the selected sample rate and trigger 
SPORTO receive inteiTupts in the DSR The DSP tran.'sfers data 
to tiie CODEC using the same mechanism as just describei^l 
(in tlie other direction, of course). The address portiot^ of 
each transfer coming from the DSP identifies the data as a 
control word (for prograniu ling the part) or as a data wend 
to be sent through Iht* on-chip digitar-tO'imaJog converter 
(DAC) imd transmit ted to Ihe data access arrangement ciiip. 
Data from the CODEC to the DSP Is identified as either a 
contrtjl word or as a data word from the on-chip analog-to- 
d igita I c on veri er ( A DC ) . 

The AD28msp01 CODEC is attached to the telephone line 
tlirougfi the data access arrangemeni chip, Transtuit data 
outputs are differential for noise reductuiri, while ilie re- 
ceive data input is single-ended. 

Data Access Arrangement. IIP TeleShare uses the TDK 
7-IMilU(J^ tiaia access aiTimgenicnt chip m its telephone line 
interface. TliLs part provides all the necesstny hne monitcjring, 
lllfering, isolation, protection, and sigiial conversion functioTis 
for comH*cdon of high-performance analog mcxiem designs to 
the PSTN (public switched telephone network) in t-he llnited 
States, Canada, and Japan. Tlie 73M9002 incon>orates, on a 
t\\o-tf>-foLu--wire hybrid, ring tlelection circuitry, off-hook 
relay, and on-ho<ik line rnonitoring for caller-ID support (see 
^Callcr-ID" on page 72). 

The 73M9002 comes witli FCC (Federal Communications 
Conutiission) part 68 DOC CS-03 and JATE (Jaf>an Approvals 
Institute for Teleconimnnicntions Eciuipment) protection 
circuiity built hi, imd is ci>mplianl with VL 145JJ 2nd Edition 

' The B/7 mode n a cafjahility requfted by soma madsm apphcations. Ir simply adds same 
sampling bandwidHi f oi flxampte, m 8/7 mode tt>e normgl fl-ltHz sarapls rate becorrtas 3.14 
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Caller-ID 

Caller- ID InfarmatiDn is sent between ttie first and second power ringing signals 
Tile data is sent a minimum of 50D rniJIiseconds after the first ring and ends at 
least 200 mSfliseconds before the second mq begins. Tfus leaves 23 to 3.7 seconds 
of Time for data transmission The data is sent at 12M) baud using frequency shift 
keying [FSK| moduiatson. All data is 8-bjt ASCII 

Two standard formots exist for Calfer-!D informal ion; singfe message format and 
multiple n^essage format In general, both formats can be described using Fig. 1. 

The message type js 0x4 (hexadecimal 4) for stngle message format. The message 
fength is variable and indicates the number of message words in the mes.sage 
body, The final word is s checksum word, used for error checi;ing, Singfe message 
formal provides the receiver Wfth date, lime, and calling number data, 

The message type is 0x80 (hexadecimal 80 1 for multiple message format The 
message length is variable as before, but provides the receiver with dale, time, 
callmg number, and calling name data if available In the absence of calling name 
data, a P indicaiing private or an indicating out of area or unavailable will be sent. 

Caller- ID detection requires on -hook line monitormg, which the HP Tele She re data 
access arrangement chip fully supports. HP TeleShare tan detect and display both 
message formats. 



Message Header 



Message Body 



Message Messnge Message 
Type Leiiytl) Word 



Sf! ' 7 8 15 16 23 

Time 
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0x80 Multiple Message Format 



More 
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Fifl.1, Caller-ID mass'aga format 



wilfi tlie addition of an extemid slow- blow fuse, Tlu» off- 
Itook lelay is conlrolletl by aTTL^levTl input from XBAR. 
This n^iay rletenuhu^s when I he plu>ne is off I he hook tmcl 
call be puised for use as a pulse dialer, AiRjther TTL-level 
input from XBAR is used to enable on-hook line monitoiing. 

The ring detection circtiitry is capable of detecting ringing 
signals that conii>Iy with Ringing T^'pe B from the FCC Part 
68 regnbitions. The detected ring signal appears at a i)air of 
differential outputs whicli are also connected to XBAE. 

The 7;^MP()02 prtnides telephone comiecti\ity to tlie DSP 
subsystem rhrongh the CODEC s atiiilog receive and trans- 
nut lines ai\d is attached to the telephone line through a 
standard RJ-14 connect on 

Operating Modes 

ftp TcTeShare is capable of operating in three niodes: voice* 
fax, and data. The modes are selected tlirougli a graijhtcal 
user interface by the worksration user. Tlie moile appli ca- 
tion softw^aie is dowTiloaded tiu'ough XBAR to a DSP im 
neetled and runs continuaOy until a reset of that DSP is per- 
formed. Tlie voice mode code was de\e]oped solely by HP 
and can be rim on one or both lines siniultaneoush. The fax 
and data modem rode was de\ eloped with a third party and 
because of hcensmg restrictions, only one line can be con- 
figni'ed as a fax or data modeni at a time. Conibuiations of 
voice and fax or data are fully supported. 



Voice Mode Operation. V/hen configufed in the voice mode, 
IIP TelcShme essenlially r>perates like an enhairced tele- 
phone. Digital niixittg of microphone, line-in, telephone, and 
recorded aucUo (frojii system disk) is supported for both 
playback and recording. This capabihty allows numerous 
interestjjig audio configurations int hiding placing a line on 
liold with music, r(H*f>rding conversations, ()la>ang hack re- 
cor^led audio ovej' the phcuie, and so on. Wliile m voice 
mode, IIP TeleShare provides the user with callei-lD infor- 
mation if it is available. In addition, DTIVIF (dual-tone multi- 
frequency) tone and pulsi* dialing are snjt ported, along with 
DTMF tone cietection for imattendetl jjhone fmictious like 
answering machines or voic email (see "Call Progress, DTMF 
Tones, and Tone Detection" on page 73). 

Dialing ^md liook manipularion actions are perfonned tluougb 
the GIT (gratJhical user interlace), hut at the lowest level 
these actions ai e sem to the DSP ils standaid AT coiumands 
hke ATDT (artentioi\ dial tone) and ATH n ( n = is on-hook or 
hang up, and n = 1 is take telephone off htjok). Special fimc- 
tions like audio mixing ai'e also controlled v^dth low-level 
AT-type commands, but are manipulated using shders in the 
GUI, 

The voice mode application finnware is driven primarily i>y 
DSP SPORT intemipts, Evei->' incoming l(3-bif SPORTDword 
from XBAR lriggei"s an interrupt, wliicb in \uii\ causes the 
SPORTl Irvterrnpl service routine to execute. Likewise, every 
Ui-bit SPQRTa word from the CODEC causes the SPORTO inter- 
nipt senlce routine to execute. The SPORTl intenTijJt seniee 
rout me is res]:jonsibIe for audio I/O with XBAE and qucueing 
AT commands as they anive. Commands arrive asynchro- 
nously, that is, they can arrive at any time, while amho ar- 
rives in 8-piece bundles eveiy 125 micioseconds (one frame) 
as described earlier. Normally, every piece of data received 
by SPORTl causes an ini emipt, but the fim\ware disables 
these iuteiTupts for the rest of a fnuiie once it recognizt^s the 
first piece of auflio data. Otherwise, at least eight context 
switches would occur eveiy frame, which would render tiie 
system useless. Once the SPORTl iutpirnpl service routine 
has received all of the audio samples, it is responsible for 
transmitfing tlie new autho l>ack to XBAR for routing to the 
w or ks t ai i on ( i , e . , h eadp 1 tones and/or d is k ) . 

The SPORTD interrupt service routine is responsible for receiv- 
ing and transmittmg telephone-line autiio jmd nuxing all audio 
data, includmg DTMF tones. Before mixing can occur in the 
DSE all of d^p LSBs must be apper^led to the MSBs. Remem- 
ber that each 1 (5-bit sample imnsfeiTed ber^^een XBAR and 
tlie DSP is divided so that the most-significajit byte conUiins 
the data type and I he least -significani b>le contains the data. 
Thus, all the data fiom XB.^ is put back into 16-bit Ihiear 
fomrat before transfer to the CODEC. 

The audio input and output amplitude matrices, built by the 
user ^la the Gl'L are used to determine what the final mix 
will sound Uke. The DSP firmware processes each output in 
sequence by adding together any inputs that are on to tieate 
a total value for each output. Any gaui adjustments are made 
at this time as well. Wlien this is completed for ail outijuts, 
tJie resultmg l(>bit \alues are broken into MSBs and LSBs, if 
reguiiTd, 

Audio data that is meant for XBAR is transmitted diuing the 
next XBAE audio ft-ame. Audio data meant for output to the 
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Call Progress, DTIVIF Tones, and Tone Detection 



HP releShare's wee raode firmwware has the aliilfty to detect a mimber of tor^ 
used oDmm^ ^ incJuding tJTMF tones and csSi 

progress torr : sound you hear when you caU 

sorneof^), and Oiai mm 

DTMF Tones 

G'jat'tQ-fie muittlTequercY (DTMF) tones aie made up of Iwd separate tones, as tte 
name suggests, and can be accurately generated usmg ^sily undetstood pnnDptes 
^e U\W s'r ' es two sets of distinct tunes, called row frequ^c^es 

and CO tumn v ee f sg. 1 ) The row frequencies correspond to ttie hori- 

zontal njvvs OTi a siandard lefephone toochpad l^e column frequencies (XHreiirond 
to the vertiis] columns on the toucJipad. pEus an addJtionaJ column to the right of 
the lest touchped column. 

Thjs makes ejght separate frequencies, which combine for a total of sixteen OTMF 
tones (see hq. 2) 

Generation of a DTMF tone is accomplfshed by creating a smusoid for each of the 
two frequencies, row and column, and then adding the results, In a dfgitel impie- 
mentatjon, the sinusoids are computBd and added on a sample-by-sampfe ttasis. 
HP TeleShare uses a five-coefficient Taylor series approximeteon for the sinusoid 
generation The sinusoid samples are updated and added at B kHz, or every 1Z5 
microseconds, and the sum of the sinusoid samples is used as the current OTMF 
sample. 

Tnne Detection 

Tone Detection is accomplished tfinough the use of a 5i2-potnt fast Fourier transfonn 
(FFT), which is implemented mthe A0SP21D1 C-language run-time library. The FFT 
when given a set of samples of an input sicfrtal over some time intervaf, returns the 
frequency spectrum of the signal during the interval This can he done in almost 
reat ttme with a DSP, making it very useful for detecting incoming tones. The 
following important rules and relationships should be noted concerning sample 
i3te, input points, output points, time, frequancy, and the FFT in general 

•The FFT requires complex (real and imaginary] data for input (two arrays). 

•The imagmarv input array may be filled with zeros if unused 

•The output data is complex {two arrays!. 

■The frequency spectrum returned covers haJf of the sampling frequettcy. 

'Only the first half of output data is used, and the other half is a mirror image. 

■The output frequency resolution is equal to (sampling rate (/(number of input points |. 

ijsing an 8-kHz sampling rete and 512 points causes the FFT to return a spectrum 
from D to 4 kHz. with 51 2 complex output points. The second 256 output points 
can be ignored since they are Iha mirror image of the first 256, The output will 
have a resolution of 15.525 Hi per point, using the fomiuEa above. These output 
points will be referred to as bins since they include spectral data on either side of 
each point 

HP TeleShare calculates magnitude-squarad values for each bin by squaring the 
real and imaginary values at each point and adding them. The magnitude'Squared 
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Rg, 2, Call progress and DTMF tojies feEXignjzed by HP TeleShare. 

values correspond roughly te the power of the signal in each bin, Once the powers 
are known for each bin in the spectrum, they can be analyzed to see it any DTMF 
or call progress tones are present. 

As an example, suppose the telephone has been taken off -hook in preparation for 
dialing and HP TeleShare is configured to check for dial tone. 512 samples of the 
input signal would be stored in the real input array, while the imaginary array is 
filled with ^ras Next, the FFT function is called, returning the real and imaginary 
arrays. The magnitude-squared values of the fsrsi 256 bins are computed using the 
two output arrays. The two frequencies that make up a dial tone are 350 end 440 Hz 
[see Fig 2). so ^H (Odexos (or bin numbers) must be computed for these frequencies: 



350/15.625-22.4 



440/15.626 = 28.16 



Fig, 1, Dual-tftnir?muititrequenD|/ cfigits uau [n\j iit;quieJM:ies associated wuh them. 



An effective method of checking for the ajtistence of a particular frequency is to 

compare the pewer present at that frequency with the total power of the spectrum, 
This IS [ione quite easily with magnilude'Squared values since they represent 
power m each bin already Total power ts simply the sum of ail the magnitude- 
squared values for the first 256 FFT return values Divide this into the power of the 
frequency being checked for. and the result is the percentage of total power Fur 
that frequency. For example, when checking for 350 Hz, compute the sum of the 
power values for bms 22 and 23 since the real index jZ2 4| falls between them, 
and then divide by the total power The result is the percentage of the total power 
present around 350 Hi. The same can be done for 44Q Hz. using bins 2B and 29, 

Once the percentage of total power is calculated, a compafison can be made to see 

if tne power in each frequency meets fn3t{:h cniEm- The HP TeleShare firmware 
typically uses 35% of total power as a match condition. In other words, if the power 
present at the desired frequencies is 35% or more of the total power, diet tone has. 
been detected. Otherwise, no dial tone is found. 

The number of bins used in the comparison and the match criteria can he fine- 
tuned for a particular application. The match criteria can include other tests and 
can be relaxed or tightened as needed, The number of bins used can he influenced 
by the total number of points in the FFT and by a preprocessing tool that does 
windowing. Windowing is used to create a finite-length sequence from a continu- 
ous sequence. It is basically a digital filter that truncates an infinite -length mput 
sequence while preserving its frequency characteristics. Since we are grabbing 
finiie pieces [sequences) ol data, wa need to window the data. 
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telephoiK- Imp is rrnnie<liatf>Iy sent to the CODEC withont 
being spitt in hidH Since botii inlemipt semce roiirincs nm at 
8 kHz. there is no need to worry aboin sample rale chjuigos. 
DTMF audio data is only available far mixing wlien a u jne is 
l>eing grnerated. A new DTMF sample is generated during 
every SPDRTO interrupt aiKl is based on the sample rate (al 
ways 8 kllzj and the time elapsed since llie tone began. 

All of these internipts and audio Ji manipulations require al- 
most all of a DSP*s processing liandwidth ami caji effect 
some areas of system perfomumce. Because of DSP band- 
width limitations. DTMF detention can have a slight, but 
notieealiJe effect on the audio qiuility heard by tbe user. 
However, in unattended modes like luiswering machines or 
voicemail (wliere DTMFdetectioii could lie used for such 
things as navigation), this stiouid not lie a conceni. Tlie de- 
fault configuration has DTMF detection disabled, since the 
typical user w^ill never use it, and tlie current GUI does not 
support it. 

Fax and Data Modem Operation. The fax and data nuxlem 
funciionality was codevelojjed by HP mid Digicom Systems 



Incorr>oraled and uses their SoitModem technology. The fax 
mode allows transfers up to 14,400 bits/s and covers Group 
:3 Class If and all fallbacks. Data mode supports tnmsfei-s up 
to 14,4(10 bits/s {VJJ2bis) ajid can reach peak rat4?s of SZ.t^OO 
bits/s with compression, 

Conchision 

HP TeleShiue effectively combines telephone conunimica- 
taons capability with a Iowk^ose computer woikstat ion. Con- 
text switches betw(^en (he display and the tt-lcphime ;ye 
nmiiniized by mtegrating the telephone into (he computer 
system anc( providing an <^asy-fo-use graphical user inter* 
face. Voice, fax, and iiigh-speed data modes cU'C supported 
using flexible digital signal processirjg (eclinoiogy. 
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Product Design of the Model 712 
Workstation and External Peripherals 

A product design without fasteners and the use of environmentally 
friendly materials and low-cost parts with integrated functions provides 
excellent menufacturability, customef ease of use, and product 
stewardship, 

by j\rleri L. Roesiier 



Tlie HP 9000 Model 712 workstaUoii and aiut the three new 
peripherals that go witii the product arp aii excellent exam- 
ple of coiii[)iuer integriition and simplicit>^. The new work- 
.station, while [>rfjvidiiig a new t-lass of performance with 
HP's nt^w PA-RLSl' PA 71(K)LC processor, pushed Uie enve- 
lope <i( product design by using relatively few and inexpen- 
sive parts. In ad(!itJon to simplicity and low cost, the product 
jiiTniiotes good jjroduct stewardship by making parts easy tt> 
icienLify and recycle. (*ustomers find the biu'dwarc ea^y lo 
manage because there are no fasteners to deal with^ and all 
the components snap or drop into place. The maiii work- 
staUon product is a snudl eonipacl size that fits easily mider 
a rnoJTittjr or suuids vertteany on tlie desk, ajui tlie external 
pt^riiiherals nm be iiDsitinned r>n tbe desKlfJii where they are 
most convenient to the user. Fig. 1 shows the Model 712 
workslati<m and its threc^ jx^rip lie rats. 



Outward Simplicity 

Se\'era] asseiiihlies of tlie Model 712 workstation products 
have high levels of functional integration. This func!ional 
integrarion tends to make components more complex, but 
yields an outer simplicity by reducing tlie lujniber of physi- 
cal parts and the methods necessary to work with them. 
Onc:e configured, the only accessible components of the 
Model 712 w^orkstation include the chassis, system tujaiTi, 
option boards (including memory ). disk drive, flexible disk 
drive, and top coven All of these components are accessed 
through quick removal of tjie cover and the manipulation of 
a few snap or droj.^-in fits, which reqiure a mininmm of time 
tUid effort. Fig. 2 show^s the workstation and one of the pe- 
rt])herals with their covers removed. Benetlts of this result- 
ing siinplicity include better nuuiufacturability, easier cus- 
tonier use and configuratic*n, and sendceability. 




Model 7Mm 



Fig. L liP fKiOtl Model 712 workstatiun nmi relatod (>xtt^n]al 
p(?riptierals. 



Fig. 2, The M(kI('I 712 workatation and tttirci disk iJeripliiTHl with if)|] 
( t) vf ' rs t Usa^sii enib 1 ed . 
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Main Sysietn EnpaRsion Boards —k 
Board — i {Three Slois} \\.^ 




Fig, 3. Tuj> view iif ihc Model 712 wfhout t.oji v.mQV. 



chamber of the product. The main drcuit board, power sup- 
ply ajifl rriver, rlisk brackets, and top center ill] sna]i or ^Irop 
iiitfi this c liassis. Option lioaitis are al.so PiLsiIy iiistalleti irkto 
tilt* chassis on lop of tlie niain system board. w\th integral 
bulklicads thai mate veitinilly to chassij^ rr it outs (also \dth- 
oiit fasteners). 

Power Supply Cover 

Tlie power supply cover is another example of integration. 
Many [mris were *'designe(i out" by this single i>lastic part 
tliat iiertnnns six functions, Tlie nitiin fmiclioii is to protect 
end iiser^ froni daiigeroiis voltages by sliroutting the exi:H>sed 
power su])|il5. The cover snaps into the chassis from front to 
rear fuul is reniuvable only by using a screwdriver to disen- 
gage the sn;ip Ihal holds it in place, hi addition to slirouding 
tiie i>ower sujJi>ly, the cover seciu'cs tiie power supply S>oard 
in place, hovuses the fan imd speaker, channels air flow, and 
provid(^s striiciural sujjport for the monitor. The fan simply 
snaps dt>wn hiside (he cover and seals to thc^ sid<\s and top 
of the cover. The spe^iker slides down anri iiress Tits into a 
simple pocket, which provides acoustic battling. After the 
cover is installed, cables ijrom these devices are routed to 
tlie main systc^m board for electrical connection. 



Electronics 

The system ele<if ronics is tiie place wliere mtegration is most 
Ukely to be tlrsi ntrticed in Itie Model 712 product. Electronic 
assemblies eoi^sisi of tJiie rtiain system lioard, a i>owersupj)ly, 
thi'ee optional eireuil boants, and up to fournienioiy 
SIMMs, Tlie mahi system board is relatively smallj and all of 
the core electromes is im:'ori>o rated onto Ihis board througfi 
integration of fn actio nality Jiitf> relatively few' V'LSI coin]>o- 
nents. (Fig. 3 in the aitiele on page J I shows ihe main system 
board). The main system board uses dual-sided siulaee 
mount construction, with I/O comiector space being pro- 
vided mostly by double-high (stacked) bulkhead coimectors. 
0[>nonal hoartls me pnmded for telephony, extra lA), and 
liigb-resolutiou graphics. Compared to todays perstjnal 
eompuiers, the MorU^l 712's system Ijoard fmictions me usu- 
ally found on a personal computer's niotlierboard, back- 
plane (if any), and two to three expansion Ijomtls, Tl^is level 
of integration on the Motlel 712 exceeds the density of per- 
sonal computer functionality, wiiile pro\iding current work- 
station perff J nuance. 

Chassis 

Tile t bass is assemlrly consists of a plastic base, a metal 
chassis, a rnetal liner for EMI contauiment of tiie rear 1/(1 
connectors, mid a phislic rear dress panel (see Fig. 4). The 
dress panel includes silkscreened graphics to identify the 
comiecTon^ aiKl stale necessaiy regulator^' information, 
eliminatuig tlie need for uiformati^m labels. Tlie chassis has 
a viiriety of holes m\d embossments to assist in joining the 
plastic parts to it. Tlie plastic l>ase pro%ides outer air venting 
and cosmetic appeal lo the product wdiHe also containing 
several snaps and guiiles for ntating parts. The metal liner 
provides EMI finger contact to all comiectora in one pan. 
W'hereas previous i>rockicts often required nimiy different 
clips for sueti functionality, field together via plastic beat 
stakes, the plastic base, the metal chiissis, die nietiil liner 
and tlie plastic dress panel nuike up the maui asseniliiy 



HP-PAC Disk Brackets 

Tlie disk fjiackels are made of HP's newly patented HP-PAC 
material. ^ Tliis material is made of ex|janded polypropylene 
beads, and is usc^d most often to produce sliipping carton 
cushions foj^ many types of products. Instead of plac ing Oiis 
material jtromid a fuiisbed product to cushion it in a sliip- 
ping carton en\ironment, it is insteatl formed to fit inside a 
[iniduct will I intt^gral recesses to eiubtHJ internal tompo- 
nenis. For ilie Model 712 workstation, llie IIP-FAC material 
is used to hold the hard disk mul llexible disk mechanisms 
in place. The HP-F^AC" used in the work-station consists of 
three parts: a bottom shell wliieh provides a recess for both 
llexible and hiiitl disk, mid two set>arat:e top pieces I'or cover- 
ing each disk mechajiism (see i>ottom t>oj1 ion of Fig. 4), 
Because of the eushionuig propenies of the HP-Px^C mate- 
rial, the disk drive mechmusms benefit from reduced shock 
ami vibrat ion levels, Tlie IIP-PAC? mat e rial also provides 
integral air ehiuuu^ls for inlet air to be drawm across hot 
areiis of the thsk drive mechanisms. The interesting feature 
of I1P-PA( ■ is that no screws are needed to install the mecha- 
nisms. The devices simply diop into recesses inside of the 
eushioniug material, and cables cmi lie comiected directly to 
tlie enibe<lded mechanisms. CJnce in place, the chassis en- 
closure then retains the top and bottom shells of HP-PAC 
around each device. 

Top Cover 

The top cover includes a configural>le bezel for the llexible 
disk area, a plastic lop shelf and a tlun metal liner to com- 
pleti;^ Ihe EMI enclosure, llie luier is held to the cover via 
plasiic heat slakes and has a series of lingers on each side of 
the clover to contact the chassis and coiUairt EMI radiation. 
The flexible disk Ijezei is designed to snajj into the front of 
the coven which then conJlgures the frontal appearance of 
the prod net. The cover assembly ch-o^is vertically onto the 
chassis ai\d then slides rearward mitil alignmeEit hooks and 
sna^Ds in tlie cover engage to hold the cover in place- 
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Fig. 4. Tiie MocleJ 712 work- 
staiiijFi shovvlnj? crompoiients 

tli*>ni5.'^mhfi*ti from the ohaissLs. 



External Peripheral PnKiiicts 

The nHKhirl tk'hiign tjf llie three i^xiernai peniiheriiis alscj 
inchides a large tk^gree of fiinrt tonal integration. Each of 
ltu:*se f)(jXfs is (lesii^ned as \\ iiiiuiatnre M^itlel 712 work- 
siarion. wiiti [1P-PA(^ rtisfiltjns providing iotaikjn Jintl sup- 
pijil for tlie drive niecli«iOisni, a [irioted circuit board (f<f)r 
power conversion), power switch phmger, and cai>ling. The 
plastic cover for each profluct in chides «iny oeet^sfiiiry <loors, 
lil^hf pipt^s, and Ivutloas. Tlie ehixs.sis assembly of each t>rod- 
viel integrates a plastic base, metal chassis, spring *iip, dress 
panel and SCSI signal cable (attached with screws by the 
v(MKlor). Tims, final assembly parts involved hi the niajnifar- 
t II ring (jf th(* box hichide only the chassis assembly, intenml 
fiowtTcabh\ [mined cirt nil board, ijlnn^er rod. IIP-PAC. 
disk mechaiiisnu and top coven Like the workstation, there 
are no fasteners for manufacturing or the customer to deaJ 
with, £ind the top cover snaps into place to retain all parts 
inside. 

Low Cost for En try- Level Pricing 

To conimmid tower materia] crests for mcchanie^d conipo- 
nenls, rdl custom [ilastic iM\(i sheet metal pjirts were hard- 
Itjoleil tVjr mass production, Tlie chassis (jfeach product 
was designed with a mini mum of folded features to reduce 
pan comph'xity mid the cost assoclatefl with thai complex- 
ity- All tniVJor sheet niefal p^irts use progressive tCMjlhig for 
tlie lowest price. 

To reduce the amouni of Una I assembly time (and lalior 
costs) involved in the prodnct. ciMtipononts were dc^sigut d 



\^ ith a lugh degree of bmctional integration. Integrated com- 
ponents ^stictr ris chassis or top c(»ver assembUes) are its- 
semblerl by ventlors, placing the burden of labor Km these 
nnn-IlP processes and thus achieving lower pricing of the 
final product. Tliis functional iiUognilion of components also 
lowers (XJst l)y reducing part count; and related inventoiy 
management. 

Because of tire no-fastener design, fmal assembly takes 
mider four niiiiutes for the workstation product and compa- 
rable times are achieved for the external perii>herals. This 
easi* of manufacturing lowers majuifacturing costs because 
f]f rcthici»d assembly time find overbeail costs. It also makes 
t!ic product much better suited to indireii market cliafuiels, 
wlijch prefer to configiu^e produds tliemselves and often do 
this at the last possible moment before shipment. 

Environmentally Friendly 

The Motiel TIJ ^workstation and peripherals also confonn to 
i I P s n e w gu i d e I ines fo r p rt m 1 u ct s( e w ar dsli i p . Viri ual ly e veiy 
component of the wcirkstaiioi^ ^md i)erif)hend products can 
beeasiiy dlsiisseoibled, identilled, mid recycleti. Each plas- 
t ic part contains engraved information that identifies the 
tyjie tjftJlastic used. t\x\K\ only four ditTerent tyiies of plastic 
are used within the entire family of jjroducts. To assist die 
disassi'mbly |ut)ces.s. the products use pkistic Ileal staking 
to joi!i paris together, whic^h can easily be cut away tluring 
the disassembly process. The new IIP-PAC material cjm be 
recycled as well eithcTby grinriing to \>*^'\\v\ size ami reusing 
in other shittpirtg cushion jiaris, ur l>y nicltiug tlu^ material 
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down to m\U] plasUt . Md again, because there are virtually • Reusa»>le aflermarket components (flexible and hard disk, 
no fastentn-s to deal wirh, disasseniJily is quick and thus power supply. CTll and fan) 

more parts ate given to recycUng. Materials willi bioniidc • Bulk r>ackafaiig of iinal assembly components implemented 
compositions tmve been avoideti, except for the IIP-PAC^ on larger parts (reduces manufactuiing waste) 

parts, whiclT require a Ijromide flame-retardajxt treatment to • Printed circuit boards built iti approved non-ODS (oxone^ 
meet safety requirements, depleting sul>stance) processes 

Other prochiel slewaidslup features include: * Embedded fan (low acoustic noi^e). 
■ No painted (xjmponenis fall [ilasHcs with molded colors) „ ^ 

No plated plastics Reference 

No atlliesives ^ '^' Mahii. et ai. "IIP-PAC': A New Chassis and Housing Concept Tor 

Required labels .an be recycled along wil.li plaslic basD f "'7',^ m '"'""I's" ^'^'"^"*-^^"'*«'"'' '"^'"■""'' ""'"■ ^^' "°- '^^ 

material ^"^" ^•^' '■ ''''' -■^^-'^■ 
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Development of a Low-Cost, 
High-Performance, Multiuser Business 
Server System 

Using leveraged technology, an aggressive system team, and clearly 
emphasized pnorities, several versions of low-end multiuser systems 
were developed in record time white dramatically improving the product's 
availability to customers. 

by Dennis A. Bowers, Gerard M, Enkerlin« and Karen L. MurLIlo 



The HP 9000 Series 800 Modeb E25, E35. E45, and EBB 
CEx5) and ihe HP 3000 Series 90a 918, 028, and 938 (9x8) 
biLsiness sen'ei^ were developed as low-cost, performance- 
enliaiKetl replacenienLs for the HP 9000 F Series and low- 
end G Senes mnl the HP 3000 Series 917, 927, 937, and 947. 
The develo]itnent of llie PA-RISC PA 7I0OLC processor chip 
and the LASI (LAN/SCSI) I/O interface and die evolution of 
DRAJVls for main memory enabled the development of these 
lo%\-end ser\^ers. The PA 7100LC and the LASI I/O inteiface 
are described in the articles on pages 12 and 36 respectively. 

The priorities for the Models Ex5 and Series 9x8 server 
project were short time to market, low cost, and improved 
performance. The functionality and quality of the new serv- 
ers were to be as good as the products they w^ere replacing, 
it iu)l tiiMtcr. The rlialienge wjls to gel these new servers to 
market jls soon as possible ho that UP could crjnfinut^ to tie 
(competitive iji the biismess server market and oiu' customers 
coitld benefit from belter performance at a lower price. We 
wt're able to get the first versions of these systenis com- 
lileteti, re J eased, and shipping on time with all new VLSI 
cotnponentis, 

Low-Cost» Higher-Perfomiance Features 

The j)riniii)lal reiLS<in foraciiievlng high integration and low 
cost for the Model Ex5 ajid Series 9x8 servers was the devel- 
opment of the PA 7100LC p^o(^essor chip, which was being 
developed at the sanie time as our servfei's. Integrating the 
floaling-poini miit the IK bytes t^finiernal instruction 
cac;he, the (wtenial cache interface, the TLB (translation 
lookaside buffer), the niemory^ controller, and the genera] 
system connect (GSC) I/O interface inside th<» PA 71(KJLC 
prfK'essor chiii allowed the M(jdel Exfi and Series 9x8 
designers lo condense the CPIT and main memory onto 
the same board. 

Also^ al tlie stmie time as our new servers were bring dev^el- 
oped, DRAJVl densities floubled (in S4>me cases qiiaflrupled) to 
allow n^^jre memrjry to be put into a smaller space. Tlie 
Model ExB mid Series 0x8 servei's ii.se the sanur' ijidustiy- 
standard ECV ( error correction toded) SIMM nvudules used 
in the HP JIOOO Model 712 and cjther HP vvorkstatiojis. The 
Model Kx5 imd Series 9x8 servers use lt>M- and 32M-bvte 



SIMMS which must be inserted in pairs to pro\ide 32M to 
256M byt^s of main memoiy ECC memor>^ w^as chosen be- 
cause it carries two additional address lines making it pos- 
sible to put four times tlw memory capacitj^ on one SIMM 
wliile staying couipatiblc v^ith uidustry-standard nnjtlules. 
The 64M-bjte SIMM was desigueci several months after fii^t 
introduction of the new low^-en(J sen-ers to boost their maxi- 
mimi memory to 5I2M bytes. This larger SIMM is not ax^ail- 
able as an industry standard. 

Four versions of the Model Ex5 and Series 9x8 processor 
have been developed, differentiated by clock speedt cache 
size, and cost. Each vemion is fully contained on the system 
board (which also contains cache^ niain memory, processor 
dependent hfirdware ajifi firmware, and 802.3 LAN comiect) 
and is easily instaJlalvle ajid utigradal)le. Table 1 lists the 
technical si>e<Mrn'ations tor the different Model Ex5 systems 
anti sumntarize.s the HP-UX'-' t>t^rfonnance characterizations. 
The Series t>x8 MPE/iX systems Jiave ecinivalent CPU hard- 
war t% and their speeiflcations aie close to those given in 
Table L 

Table I 

Technical Specifications for HP 3000 Modei Ex5 Systems 
Running the HP-UX Operating System 
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Architecture 

Fig. I sIkjvvs a Ijlock ciiagram fr^r the Model Ex5 antj Series 
9x8 sen- em. 

The general system connect (GS€) bits was designed as a 
new, more powerful syKt.eni bus rorlii^herperfoniiancc. The 
Model Ex 5 and Seiies 9x8 servers only use Llic GSC bus for 
the processor, main memorj^ aiul 802. 3 LAN tliroiigli the 
LA SI chip. The midrange and higli-end sen/er systems also 
support the GSC bus as tJieir high-peribmuince 1/0 bus. Ml 
f*A-l^ISC; systems support the liP-PB^ (HP precision bus) as 
the common 1/0 bus because multiple function^tlily (hard- 
ware ajid drivers) ciirrenlly exist for this bus. The interface 
from the GSC t>us to die HP-PB is acconiplished in a chiij 
called the HP-PB bus converter 

The HP-PB bus converter chip Is a perfonnaiice-improved 
¥61*31011 of the bus converter that Wiis used in the HP 9000 F 
and G Series and HP 300U Series 9x7 machines. This <:hip 
allows tiu^ Model Ex5 and Series Ux8 scr\'ers to leverage 
HP-PB I/O functionality fiiom the systems they are replai^ing. 



The HP-PB bus converter inaplenientis ti'ansaction buiTeringt 
as an HP-PB slave, gaining perfontiaiu e iminovenients of 
W/u to 28% over its [predecessor The < hip suptKjrts GSC to 
HP-PB clock ratios ranging from 3: i to 5: 1 in synciironous 
mode when the GSC bus is operating mider 32 MHz. It 
switches to asynclironous mode when the GSC bus operates 
in I he 32-lo-4()-MlIz range. These ratios and the asyncliro- 
nous feinine of the HP-PB bus com-eiter allow fair flexibility 
m CPU and GSC (operating frc^quencies wMle maintaining a 
constant 8-MHz IIP-PB frequency. Tlie bus converter also 
provides an interface to the access port used for remote 
support* aufJ the contjf >l signals used for the chassis display 
and status registers. The chip is Llesigned for the HP 
CMOS26B process and comes packaged in a 20S-pin MQFP 
(metal quad flat pack). 

The other key VLSI chip used in the I/O structure for the 
Model Ex5 and Series 9x8 seivers is the IAS] chip. The LASl 

1 With tiBiisaciion buffering, di^rmg readi frtini disk, da^ is bufferect so that HP-PB transac- 
tions can commus at m&ximiim pace- 
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chip is designed to Ymve the same tntegraUon impact on core 

I'O as ihe PA 7100U: had on the CPi: and GSC bus interface. 
The workstation prochicis are able to take ad\'aiitage of this 
(see article cm page 6i, l>ut the multiuser server systems 
were not able to lake advantage of LASl functionality. 

LASI functiomilit>- includes interfaces lo IEEE S*I2.'^ LAN, 
SCSI, jirocessor dependent i-ode. Centronics, RS-232, audio, 
keyboard, flexible disk* aud GSC bus arbitration logic and 
die real-time clock. Because HP-IJX and MPE/iX software 
drivers could not be made available in time for our release, 
only a small subset of LASI functionality' coiild be n^^d on 
the new sen-ers. Tlius. die decision was made to contijuie 
using the core I/O card from the previous versions of low- 
end senders because il pro\ides all the fimctionality needed. 

For the 96-MHz version of the Model Ex5 and Scries 9xS 
servers, a cliip with a subset of tlie functionality of LASI was 
used. TliLs was developed as a cost reduction fcji those ap- 
pUcations diat use only the LAN, GSC bus arbit r^it ion. and 
processor dependent code path. The 9(>MHz version had to 
add a real-time clock on the system boani to have equivalent 
fimctionality to what was needed from LASI 

In addition to the al>ove VLSI chips and printed circuit 
boards, the Model Ex5 and Series 9x8 sen^ei-s have the inter- 
nal capacity for two cUsk drives (4G bytes) t two ret^iovable 
media de\ices, and up to four I/O slots. The packaging and 
power supplies for the new serv^ers are highly leveraged 
froju the pre\ious low-end server systems. 

Meeting Fast Tlme-to-Market Goals 

Meeting deadlines for any program is always a challenge. 
To(5 often it is believed that a few extra liours a week is all 
I hat is needed ro keep tltc project on irack, Bui tnany well- 
mtentionetl programs soon lose lime with unexpected de- 
lays even when the project team is made up fjf indui»trious 
folks willing to do whatever it takes to stay oi^ schedule. 

At large corporations like HP, where releasing a product to 
market may span several divisions, the task is even [uore 
daunt iMg. With our lab s mission of |n'o\iding worlci-ciass 
low-ertd commercial business systems and senders, time to 
market is <dways expected to be a key objective. In the case 
of the Motlel Ex5 and Series 9x8 progi'am, it was the priniaj^^ 
ot>jective. Additionally, we were challenged lo keep cost 
Ijrojections in line with tlie set goals, mitl t(> meet tjr exceed 
Ibe quality of the versions of the low-end serv^ers that we 
were replacing. Quality is consistently a key objective on all 
IIP products. 

The main challenge for the Model Ex5 and Series 9x8 
program was to achieve (on sc^hedulej aJi order fultllltnent 
cycle timet of 10 or fewer days for the entire product fmnily. 
With the existing product family averaging order fulflllmetit 
cycle times four to five times larger than our 1(J or fewer 
flays goal it w^4s evident that for the new senders a 
wellHjr<"hestratcd program that invi.jlved I he entire system 
(eaiii was necessary to nicet this challenge. 

Fig. 2 shows a spider chart of the overall metrics for die 
Model Ex5 and Series 9x8 program. Note tliat tlie program 
aclut-\<^d or exceeded ah [jlanned niaimfacturing release 
gofds. Even tJie fa<"tcjr> ct),sl goriJ wasexceetledt vvliicli was 

t Order fulfillment cyc'e time ts measured imm whan HP receivas a customer's order ta the 
tima whemhQ nrdsr is daliverEd at th& custDrner's duck 
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at risk when the hard%v£ire team added existing material uito 
die design over less espensise fiuicdonalitj; reducing the 
softA^^aje development scliedule. The order fulfillmeni cycle 
dme objecti\ e was not oidy achieved but exceeded! For the 
first three monthi^ of production, tudcr fidlill merit cycle time 
aveiaged under nijie days. 

The following sections stimniarize the re^isons we met or 
exceeded f>iJr cost, quality, tinit-fo-juarket, mul nianufac- 
turing release goals. 

CoiTsohdatiDn of Project Team. VVIu^n the Model Ex5 ami 
Series ^^x8 piugraju wlls in ii.^ early siages of design the de- 
velopment team was dispei-sed in l.wo different geoj^rajjliic 
locations. Tiie remote orj^aniza1if>r\ was ehminated and the 
project develoi>mcnt and iiumagemeni were consolidated in 
one locariou imder one manager With this organization, 
technical decisions regmdin^ system requirements could be 
made rtuickly and effectively. 

Ownership of Issues, A system team coniposed of represen- 
tativeslToni rlu' iliflV^reni organizations involved in the de- 
velopment ot di(* Models Eko and Series 9x8 servers was 
organized. Weekly one-hour meeting w^ere held whii the 
main focus on issues or concenis that unpacted the project 
schedule. Comtnttt^H ation was expecteti to be limited to 
discussions that afrected everyone. Issues were captured 
and assigned mi *>w'ner with a date assigned for resolution of 
the issue. Representatives at the meeting were expected to 
own the issues thai were presented to tJieir organijsation- No 
issue was closed until the team agreed tipon it. This er^sured 
That technical proi>lems did not "bounce" aroimd looking for 
nil owner 

loterdi visional Cotnmuni cation. l-]rrecrive interdi visional ivmws 
estal>hsh good working relai ionships to ensure timely re- 
sponse to actions ajitl issuc\s. An example was the rlecision 
to change the core R) futiriion^dity. Wliile tlw hnnHmv 
team itupn>ved dieir factory cost by incort^orating ttew, less 
costly harilwm-e, die software u^ain would have finalized a 
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longer schedule to provide new software features to support 
the new liardwax^e. After reviewing tlie plans, the hardw^ire 
team, awajT ortiie critical time-t<^niai'kct objective, recom- 
mended a returri to the existing I/Q feature implementation at 
an impact lo factor^' cost f(jr the sake of the softvvaie de\'el- 
opment team*s ability to improve iheir schedule. The pn<i 
result was diat the haixiwarc? team still aclueved tlieir factory 
cost goals (by making adjustments elsew^here), and the soft- 
ware development team achieved their schedule goals. 

Leverage Design Where Possible. Wlien time to market was 
established as the key objective for the project, the develop- 
ment teams realized that leveraguig from as many existing 
products as i>ossible would greatly betwfil achieving tlris 
goai» The fotl owing c^omtjonetits were leveraged from irew 
or existing products: 

• Prodtict package. Siieet metal was leveraged from the exist- 
ing low-end business servers with minor changes to accom- 
modate new ijeripherals and a different processor and main 
memoj'y partitioning scheme. Plastic changes were kept to a 
minimum in an effort to use tools already estabhshed. (OiYly 
one new tool was reqitiredj 

• Base systjem cunllgiu^aticjn. The base system was established 
using the I/O printed circuit boards and several peripherals 
available on tl\e existing h>w^-end seners. 

• Memoiy. The mem<jr>^ flesign w^as leveraged from the mem- 
ory coiifigiu'ation used in the HJ* 9000 Model 712 work- 
station, which uses SIMM modules tor the base memory 
system. liigher-density memory was designed specifically 
for the Model Ex5 and Series 9x8 serv^ers after first release 
(o incr(^ase their maxinumi memory capacity. 

• Power supi>ly. The power supply w^as leveraged from the 
existing sen^ers. 

• Printed circuit boards. The core 1/0 boards from the exist- 
ing servers were used with only minor nrniware changes to 
the HP-IJX version. The processor board and backijlane 
were new^ designs based on ideas shared wth the Model 712 
development team. 

• VLSI. Tlie PA 71 QOLC processor chip and the LASI core I/O 
chip were leveraged from the Model 712 workstation, which 
was being designed at the same tinte ^is oiu- sender systems. 

• Finn ware. Some ottbe nrniwLire and 1/0 dependent code 
was codevelojjed with ihe Model 712 development team. 

Fast Time to Manufacturing Release. The tise of concurrent 
engineering played a key role iji rcdticing the back-end 
schedule. The back end of the schedide consists largely of 
maruiraclurjng activities (including final test and qualifica- 
tionj aimed at achieving a release of the product for vohime 
sitipment. In the ease of tlie Model Ex5 imc] Series 9x8 ser%^- 
ers, with the individual boaids being buill in two geographi- 
cally different manufactuiing facilities* il was imperative 
that comnnuncation betw^een these entities receive atnple 
attention. 

Tb facilitate tJus conmitmication, a coordination teant consist- 
ing of new prociuci introduction engineers aiTd new product 
buyers and logistics people were located in close proximity 
witli die R^D tievelopment team. Eveiyone attended the 
system team meetmgs, wliich were led by the hardware lab, 
to ensure tliat the most current information was applied to 
the overall systetn schedule. In addition, production build 



meetings were held before, during, and after each prototyije 
rtni to discuss build ivstiKs. Ensuring thai all tnanufactnnng 
persomiel realizes i lliai ihese systems w^ere engineering pro- 
totypes, with a liigh potenUal for probleitLSj was a dilTictik 
task. Most peojile were not used to seeing lab prototypes 
being liuilt in a production iiroeess- Since the lirte was 
shared with ciurently shipj)i!ig products, it was extremely 
important to ensure that building the prototyiies did not 
impede shipping other products. 

Prototype Management, Two operating system environments 
were ie<juired lor the new serv^ers, the MP-UX operating 
system release \}M and the MFE/iX ojjerathig system ver- 
sion 4.0. Since tliese environ rnetits were imder development 
at the same time as our protUictSi it was essential that hard- 
wai'e prototyi:)es be delivered efficiently and be of stifficient 
qtiality Irs et^sure ejq^jedjenf use l>y the software develop- 
ment groups. Titus, three key objectives were considered 
essenliaJ by tiie development groiips. First, imits had to be 
of the highest quality. Second, delivery of tlie units had to be 
on time. F'inally. downtime because of liardware problems 
had to be minimized. 

To accom]jlisb tlie lir^l goitl, all prototypes were built tising 
the entire production process. No i)rotot3i>es w^ere htmd- 
crafted in tlie lalx This ensured that units were built with the 
same quality standards as are at^plied to released systeitts. 
Additionally, each cust oinei^ was assured of receiving the 
latest rtnasion of mateiiats released to profhtction. Even 
new parts ikjI covered under mainifactnring release criteria 
w^ere guaranteed to be of the same revision level. All revi- 
sion levels w^ere tracked on each imit for the life of tiie proj- 
ect. 

For the second objective, a customer priority list was gener- 
atetl based on customer orders and needs. After the t>rders 
W'ere submitted to tiie manufacttuing sy. stems, i>iiild priorities 
wa^re set based on the critical needs being supphed first. 
Frotn functional prototypes to production prototypes, ttp- 
gra<Je kits were structured and tnade available, hi cases 
wliere a itew syst em was not required, customers had the 
option of moving immediately to an upgrade- Also, perfor- 
mance upgrades were designed to require a swap of the 
processor cm"d only. 

Tracking the revision level of aU hardw^are was essential to 
acbieving the tJiird objective of minintizuig dovtiitune be- 
cause of hardware. Another key pomt was being able to 
react to a customer s ijroblem quickly. We used a prerelease 
support team at another HP dhlsion to ensure timely re- 
sponse. Spare material was purchased by the suppoit team 
mid defective fiarts were reUimed to iJie lab lor aualysLs. 

U^g all these methods, we were able to achieve the goal of 
having all operational prototype units tipgraded to marnifac- 
tuiing release equivalence before manid'actitring iTlease. 
Tins guaraiit eed test partners use of the tnachines for fiiture 
development widiout the '"not-qiute-final-product" concerns. 

We were not without our share of problems in terms of 
efferti\ ely managing the protot>i:)es. For instance, several 
units were ijlaced insiiie im environn\ental test chamber for 
w^eekend testing, Dtiring the early nionihtg hom-s on a 
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Sujiday, the temperature rontroHer of the chamber wem out 

of rontro!. ramping the iemi>erattire lo beyond 70 H\ The 
additional heat raused the fire sprinkler s^^eni in tlie cham- 
ber to tuni on, QocxlLng ihe cliamber at a rate estimated ai 10 
gallons per niimite. TJie units were standing in four fee! of 
water, but with the disk drives external to the chamber, the 
tesl conlLrmed. When the chimiber was finally shut down, 
the water mopped up, and the results checked, it was dis- 
t^overed thai two of the seven ujiils;which were on the top 
rack, out of the standing %vater. eoniinued lo oi>erate with- 
out failure throughout the test. Tills tesl was affectionately 
named the "bathtub test." 

Trme-to-Markel Focus. Estabiishing the time to market im the 
kev t3l>jecu\ c tur the progriuu was not eiiough lo ensure its 
surress. The teams in\'Oh'e<l required constant reminders to 
stay focused on this objective ami niake tratl<--offs aci^ord- 
ingiy. Once the sc^hedule was confirniefi an<i accej^ted. it was 
important to acknowledge the i>rogress. Any activities that 
appeared in daiigei^ of jeopardizing the schedule were re- 
viewed and tackled accordingly. 

However, the project team realized that in the past changes 
to system retiuirements ha<i a big impact on meeting project 
schedules. Changes tosysiern requiremenrs to modify or 
Inckide a feature that niight improve sales or coukt be easily 
implementetl at the cost of another metric miglit result in 
significant changes to tlie hardware or operating system 
r](^si^j\. In the case of tlie Model Ex5 and Series 9k8 serveiis, 
the system team inii>lemented a process that w^is also used 
by tlie suftware development teams t(j control design 
t liangt^:^. Thist>rocess is cfdled efuufge etmtrol, whicti re- 
quires t fie change^ requt^ster to provide a speciftc level of 
infomuiiion to detemiine wliether a t>atiicular cJiange is 
viabk'. Whilf^ this is not a new idea, the Model Ex5 and 
Series 9x8 development team eleeted lo make one addi- 
tional rule chitiige. Each ch^inge request subnulterl would be 
briefly kjokeil at to detenuine how the t luui^e wcjukl affect 
the base system. In other words, we wanted to ensuic that a 
change was critical enough tlial it needed to be added to the 
products planned for tlie first release. 

The hardware systeni team put on hoki all chjinge requests 
that wi^rc cJetermined not io be required for the first release. 
To avoid causing kits of changers t<i ttie software alYer first 
release, niune of the critical etihmicenients that werc^ eciusid- 
ered cmcial to future sales were lunelly rev Ji*wed aitd in- 
t*lude<l in the initial software release. In some cases this 
nu^ant no chatTges were ret|uired after tlie fii-st scjftware 
r4*lease. Hcmever. there were some liLstajues of patches 
requiied for full fimcdonality. 

Customer Order Fulfillment Cycle Time 

For the Model Elx5 and Series ^1x8 sei-vcrs to siiiy t*f)mf it^f jtive, 
cost and jierfonnajicc were not the only it<^ms that t>laytHJ an 
im[)oniuit rok^ During 1993, it was clear that HP had im ortk^r 
fuiniluveut cycle lime problem, whlcJi of course made our 
rustouiei's imliappy ujid affecte<) our couipeiidveuess, A task 
tbrce was formed in address HPs order luinilmrnt cycle time 
prt>bk*ms. We found otii that results horn this task force 
would not arrive in time to help us with our new produf^ts. 
Thus, we funned a team seven months hi* fore iiitrofluction to 
ensure that the reciuced order tulfilhuenl cycle time |ii(Keas 
for thc^ Modc^i Ex^y mid Series 9xS sei-vers was in place whei\ 
tiie products were ready to be sftipt»ed to customers. 



t3ur goal was lo f^uce the time between the receipt of a 
customer purchase order for a s>*stem and the time when 
the system Is delivered to the customer site. We wajitetl to 
reduce this time by 75% of what it was for our existing sen'- 
ers> To accomplish thb goal, the following changes were 
made before product introduction: 

• The product stmcture was made much simpler and it 
includes fewer Line items. 

• Product offerings lo distributoi^ were imbundieci. 

• Product mmibering for distributors* orders had a single SKI' 
(stock keeping unit) for ease of ordering. 

• The rules for our factor^' configuration system and field 
configuration system were mirrored. 

• Early and proactive material stocking was performed before 
introduction to ensure tliat plenty of material was on liand 
to meet customer demand inunediateiy. 

• Factory acknowledgments were automated for clean 
orders. 

• hitensiv e txaining was given to order processing personnel 
in the field and the factory about tiie Model Ex5 and Series 
9x8 senders two months before introduction. 

• Consignment, demonstratioit, and distributor imits were 
stocked before introduction, 

• More caijaeity was attded to the factory, and assembly 
processes were streamlined. 

• All new processes were tested intensively befcjre 
intxcKluction, 

With tliese steps v\e were able to meet and exceed otir order 
fulfiUmenl goal. 

Cane I us ion 

Tlie real success of the Model Ex5 and Series 9x8 server 
program was that the goals for fast time to market and re- 
duced order fuinilnieul cycle time were achieved. Tiicse 
were m^jor acccnupiistuueTUs coustderirig the events that 
took place tluouglumi die wiioie jiroject including die dt^vel- 
opment of a uuqtir VLSI eomiJoncmt, consolitial ion ol the 
ficsigiv team froui different divisions mid kications, comniu- 
nication between different numufactmingeniilies, and a 
stream of last-minute catastrophes suc^h as Hooding proto- 
types in the environmental test ovens antl several eleventb- 
hoiu' \7^SI bugs that had to be fixed. 
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Kyle Christeii^icn, BiU Sehaefen Laura McMullen, and iiiany 
other individuals who helped make this program successful. 
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HP Distributed Smalltalk: A Tool for 
Developing Distributed Applications 

An easy-to-use object-oriented development environment is provided 
that facilitates the rapid development and deployment of multiusen 
enterprise-wide distributed applications. 

by Eileen Keremitsis and Ian J. Fuller 



IIP Distributed Smalltalk is an integrated set of frameworks 
that provides an ad^aiU'ed object-oriented en%ironinent ifor 
rapid development and deployment uf nuill itiser, enterprise- 
wide distributed applications. Introduced in emiy 19! (3, 
cuul now in its fourlh msgor release. HP Distributed Small- 
talk leverages tlte ParcPtace Smalltalk language ajid the 
\'isuaiWorks de\ elopmenr environment. Together, HP Dis- 
tributed Sntalltalk and VisualWorks enable rapid prototyp- 
ing, development, and deployment of CORBA-compliant 
apphcations. t 

In the global marketplaee. cojporate information technology- 
needs are increasingly demanding because worldwide com- 
petition reciulres geograjihically rUsj^ersed operations, chang- 
ing markets require agility to remain competitive, pressure 
to improve retiuii on mvestment requirc^s struni? cost con- 
tpjls. timely access to complete infonnation is crucial for 
business success, anrl tinally, corporate users require access 
u> l)otb legacy and newly developed information soiirces 
and applications. 

HP Distributed Smalltalk helps answer these l>usiness needs 
by suptKJiling: 

• Easy on-flemand ac!c*ess to information and senices across 
tlte enterprise 

• Dyi"Lamic hiteraction of distributed people and resources 

• Greater application llexibilitj' and case of use 

• hisulation from differences in operating environments 

• All arcbiu^cture that sujjporis an evolutionary ap|iroach 
in chid Lug legacy system integration 

• hid List ly standards tbai w^ill idlow application interope labil- 
ity across languages, high productivity, and cotle reuse. 

Customers can take advantage of HP Distributed Smtdl talks 
easy-to-use develoi>ment environment to create distributed 
sohitiojis to comiieie effectively hi the global marketplac^e. 
Por exarnt>le, witii HP Distributed Smalltalk, ciLstomem might 
iHjild on the sample Fonmi ap|) beat ion (described later) so 
thai their geographicidly dispersed users can simultaneously 
annotate a shared document. Also, customers might use HP 
Distribufed Smalltalk to create three-tiered database access 
applications thai extend the ailvantages of existing client- 
sender aicJii lectures for better isolation between user inter- 
faces, cJata manipulaticjn models, and legacy and new data. 

f CORBA, or Common Ob(ect Heqiiest Broker Archftecture, detmes a inechpnism iJiat enables 
objects to meke and receive requests and respons^JS. HP Distribuietl Smalttatk's impiemen- 
tarion of this architecture is described 'later in thts articis 



Tbree-tiered applications are tlie most efticienl and scalatj>le 
form of software design for biuldlng complex applications. 
They carefully separate the user interface (tier one) from 
The business nUes governing tlie application (tier two) and 
die persistent storage for the infonnation in a databiise (tier 
three). Each tier can reside on a different it^achine in a net- 
w ork. makhig best use of the network resources. HP Dis- 
tributed Snudltalk conuiins objects that enable the straight- 
forward construction of these applications. 

Using HP Distribnted Smalltalk 

An application written in HP Distributed Smalltalk Ls able to 
respond to service requests from renit^te systems. Remote 
entities that retiuest services of an apj^lication do not have 
to be WTitten in HP Distributed Smalltalk as long as they are 
in a system that implements the standaid ORB (object re- 
quest broker] ai^d common object sendees from the Object 
Management Group (OMG). See "^ Object Management 
Group" on page 86 for a description of these items. 

bi many cases an HP Distributed Smalltalk apphcation's 
component objects are distributed across several systems. 
These distributed objects can interact seamlessly so that 
end users m^e unaware of where the objects are located. 

An overview of tiie process of numing tm HP Distributed 
Smalltalk appUcation is shown in Fig. h For incoming re- 
quests to the service provider, the 0KB translates requests 
from the implementation-neutral Inteiface Definition Lan- 
guage (IDL) t(j tla^ local langtiage (ParcPlace Smtihtalk) and 
forwaids them ttj the correri local object for processing. To 
complete the request, the service provider's ORB takes re- 
tiun values, translates them to IDL and forwards them to the 
remote ORB from which the request W2ts received. 

Not only does IIP Distributed Smalltalk sitpi)ort disfributed 
application delivery but it also provides aji environment for 
distributefl apphcation development, whic h includes: 

• A complete implementation of tlie Object Management 
Group's latest standards 

• A rich suite of tools for application developn^ent and admin- 
istration inrluding simulated remote test sui^fjort, a remote 
dt^bugger, and an IDL ititerface browser ajid generator 

• A user interface enviroanteni and sample apphcatitJiis that 
developers can reuse or extend, or sun ply use to become 
familiar witli lite system. 
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Object Management Group 



The Object Management Group, or OMG. is a nonprofit internationaf CDrporation 
made up o! a team of dedicated computer industry professionals from rfiffereni 
corporations working on the development of industry guidelines and object man- 
ageinen! specifications to provide a common framework for distributed appJEcation 
development 

OMG publishes industry guidelines for camme/clally available object-oriented 

systems, focusing on areas of remote ohjeci network access, encapsufation of 
existing applications, and object database interfaces. By sncauraging indystrywide 
adoption of these guidelines, OMG fosters the development of software tools that 
support open architecture, enabling multivendor systems to work together. 

To define tfiefrarrtework for fulfilling its mission, in 1992 OMG published its Object 
f\4anagement Architecture Guide. This guide provides a foundation for the develop- 
ment of detailed interfaces that will connect to the eieniental components of the 
architecture. Fig. 1 shows the four main components of this architecture: 

• The object request broker (ORB) enables objects to make and receive requests and 
responses in a distributed object-oriented environment 

• Object services is a collection of services with object interfaces that provide basic 
functcons for creating and maintaining ohiects. 

• Common facilities is a collection of classes and objects that provide general-purpose 
capabilities useful in many applications 

• Application objects are specific to particular end-user appficaTions. 
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Fig, 1. IhB object manaserrtefri architecture. 

The appfication objects., object servFces, and common facilittes represent groupings 
of objects that can send and receive messages The software riamponents in each of 
these primary components have application programming interfaces that permit their 
participation in any computing environment that is based on an object technology 
framewort 



In additioji, becaitse HP DistrLljitted Sniallialk is m\ exten- 
sion of ViKual Works, tlevelopers iire able lo do titeir pro- 
gramming bi a language they already know (Pare Place 
Smalltalk) using tlie Visual Works application biulden 

Visual Works is an itiiplement^alion of the Smalltalk program- 
ming language and en\iroinnenh It pru\ides an excellent 
environmetit for l>tiikiing statKiaJone aitd simple cliei^l/server 
appUcations that are lOM porialile het weeit many of the 
mf^jor computing platforms and operating systems. HP saw 
an opportimity to etihance the capabilities of VisualWorks to 
be the basis for next-generation applications by adding ob- 
jects that enable VisualWorks systems to communicate di- 
rectly tisitig a standardized set of communications facilities* 

Framework 

The IIP Distributed Smalltalk framework is an en\iionment 
that encoriipasses everything from communication with 



oilier systems ili rough tlatabase access to the ohjert-ori' 
ented l^arcPlace Smailtalk langtiage and a nch suite of devel- 
oper's toohs, all seaniLessly integrateci to facihtate chstrib- 
uted appUcation development. 

The mtyor components of HP Distriliuletl SmalltHlk are 
si town in Fig, 2 and tnieHy defined below: 
■ HP t)istrilnired Snuilitalk ORB. This is a full iinplenientalion 
of the Object Majiagetnent. (j route's Common Object Request 
Broker Architecture fCORBA). 

Remote Procedure Oall (RPC) com tnuni cation. Tliis com- 
ponent supports efficient and rehable transfer of messages 
between systen^. 

IIP Distributed Smalltalk object sendees. This includes all 
standartl object ser\ices required by <listributed systems^ as 
well as support for creatmg and mainlaitihig objects and the 
relationships between them. 
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Fig. 1. Overview of thf* HP 

Distributed Smalltalk process. 
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Fij*. 2* Tfie rmjor cqrnfxjnents of HP Distributed Smalltalk. 

• Miiltiplatform support. IIP Distributed SmaDtalk a]3plica- 
tions that iiin on one platform (hardware* and operating 
system oombiiiariori) can run, without porting, on any other 
supported plaifonu. 

• OODBMS and RDB1V1S access. HP Distributed Siualltalk pro- 
vides database access directly to HP's Odapiert atui Servio^s 
Gpm Stone as well as to Sybase and Oracle (via Visual - 
Works). HP Odapter can be used to provide access to a 
variety^ of odier database systems. 

• HP Distributed Smalltalk developer tools and services. This 
level of the framework provides support specifically de- 
signed for developing, testing, ttming, anci delivering distrib- 
uted applications. HP Distributed Small talk incon>t>rates a 
rich development environment, appl if nation builder support., 
and the Pare Place Smalltalk Umguage. 

• HP Distributed Smalltalk user environment and services. 
These services include a reusable demonstiation user Infer- 
face and desktop environment support for users' work 
sessions and normal desktop activity. 

• HP Dislributt*d Smalltalk sample application objects. These 
objects jux)vi(ie develojjcrs with example code that can b*^ 
reiLsett or extended^ or can provide a source of ideas for 
developing alternate apphcations. 

The folio wiiTg sections provide more detailetl tiescriptions 
of the components that make up HP Dlstnbuted Smalltalk. 

Hf* Distributed 8malUalk Object Request Broker 

HP Distributed Smatllalk i.s a complete Litiplemeivtation of 
COHBA, the Object Management Group's specification of an 
object request f^roken IW Dist ribiitt^l Smalltalk's compliance 
provides the basis for object and application interoperability, 

CORBA specifies core services that are required of an object 
request t>roker to support interoperable distributed comput- 
ing. The CORBA specification includes the foUowing core 
services. 

Interface Definition Language Compiler. OMG has defined the 
hnerface DeHniiiiHi Laiiguagt". nr IDL, to be inde|>endent of 
otber progranunuig Uiiiguagcs, Interfaies ft jr objects that c^an 
provide distributed services are written m IDL so fcltat they 

I HP 0(lapte,r \$ a complement sry product irom He wJeu- Packard ttiat prwidBS an efficient 
and scalable Imk befwsen abjects implame«Med in an obfecT'Drienfad language such as 
SmaJIralk or C++ and the entitres jrs an Oracle relational database 



are accessible lo senlce requesters thai might be written in 
Smallialk, C, C++, or anotlier langtmge 

OMG recently approved the IDL-to-Snialltalk latiguage bind- 
ing proijcjseil by HF and IBM. This Ls unponant because it 
MJows users to build distributed systems using multiple lan- 
guages where appropriate, allowing a Smalltalk object to be 
able to request sendees of a C++ object or \ice versa. 

Interface Repository. This service provides a registry of distiib- 
utabie object interfaces for a given system. Any object that 
remote objects can access has an interface lit the interface 
repositoiy; For example, when objects on two or more sys- 
tems at diSerenl locations collaborate iit an apphcation, they 
interact by sending messages to their interfaces. Since ex- 
ternal cheuts have access to an object s services only 
tlirough the objects interface, the implementation of the 
object is private. This privacy provides a variety of benefits, 
including seciiritj; language independence, and freedtjin to 
modify the implemeiUiiiion of how a senlce is pertbmied 
without external repercussions, 

HPOistributed Smalltalk ORB Support. The object request 
broker (OJiB j is liie key tu proi idiug support for distributed 
objects. By providing an ORB on each system, HP Distrib- 
uted Sinalltalk makes the location of any object transparent 
to cUents reijue sting services from the object. 

When a message is sent to a local object, the activity is han- 
dled normally. UTien a message is sent to a remote object, 
the remote object's local surrogate (createcl automatically 
by the ORB) intercepts tfie message, then uses die ORB to 
locate the remote object and commimicate with It (sec Fig. 
3). Results returned to tlie calluig object appear exactly the 
same, whether the message went to a local or remote object 

An ORB's responsibilities include: 

• MiirsbaJUng and unmajshalling messages (translating ob- 
jects to and from byte streams for network transniission) 

• Ijocatiug objects in other images or systems 

• Routing messages between siuto gates and the objects they 
represent. 

Wliile a retjuest is active, both client and server ORBs ex- 
chaiige packet it\formation to track the course of the request 
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Fig, 3, Hf* Distriiiul ed iSmallt^lk liEuidles remote access ^o that a 
request, tu a renu^te objetrf. appearH the same ris a request to a local 
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and resolve any network or transmission enors that might 
or cur. 

Object Services and Policies 

Object services extend the c^ore ORB services to support 
more advanced object interaction. IIP Distributed Snialllalk 
implements OMCl's Conunon Object Services Speciflcatiott 
(COSS), wliich extends CORBA to pro\ide protocols for coni- 
mon operations hke creating objects, exporting and destroy- 
ing objects fhfe cycle), locating objects (naniin^). and asyn- 
clironous event uotificatioiL Additional object st^rvlces imil 
policies provide efricient interaction between fmer-grained 
d J St ri l^i itab I e o bjec ts. 

Naming,! There is a standard for assigning each object a 
umque user-visible name. Names are used to identify and 
locate bo til local and remote objects. 

Event NotlficatiDn.t Tliis is a service tiiat allows objects to 
notiiy each inhev of an interesting occuiTence using an 
agreed protocol and set of objects- 

Basic! and Compound Life Cycle. There are standiird ways for 
objects to inipiemenl activities such as create and initialisse, 
delete, copy, and move both simple and compound objects, 
externalize (prepare for transmission to remote systems), 
and internalize (accept objects transmitted from remote 
systems), C'omf>oiind objects, built from simple objects, can 
include apphcadon components, anything Qiat appears on a 
user's desktop (such as a docunient, a mail handler, or a 
graphics toolbox), complete applications, and so on. 

Relationships: ContainmeEit and Links. Links allow networked 
relationships among objects. Objects can be linked together 
with various levels of referential integrity (detennining how 
to handle situations when one of the parties to the link is 
deleted), and in one-to^ine, one-to-many, and many4o-many 
relationships. 

Together with Links, contnirunent establishes and maintains 
relationships between objects. Each object has a specific 
location vsithiu some container. Containers are relateti hier- 
archi caJ I y. H P 1) i s I ri b v i i et 1 Sn \ nl i tal k p ro vi d es obj ec is t hat 
implement a generic distributed container. Programmers 
can use tliese objects to build specific implementations such 
as an electronic mail envelope (containing components of a 
message) or a bill of sale (containing infonnation about 
items in a shipment ) viith minimal extra progranmung. 

Properties and Property Management Propeities are part of aJi 
object's external interface (owner, creation date, modifica- 
tion date, version, access control list, and so on). Tliey are a 
dyrtatTiic* version of attributes. 

Application Objects and their Assistants. Application objects 
are relatively large-grained compound objects that end users 
deal with (e.g.* a file folder or an order entiy form). Apphca- 
don assistants are lightweight obje<'t^ that implement most of 
the pohcies and participate in most of the services tltat desk- 
top objects need to pmticipate in. A^iplication assistaiKs fiuic- 
tion as the developers ambassador into t!ie object services. 
Applicadon assistants can be stored and activated efficiently 
and provide the basis for future transaction support. 

t This service is sperified Jn COSS 10. 
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Fig* 4* The buUi of user mtemetion is uilli iodiil presentiition ob- 
jeciSf miiiiniiziiig aiid condei losing the need Lo propagate semantically 
relevant changes over The tietwork. Here for exanrple, a Tjser might 
choose to look at a chart f semantic object) as a \ne, line, or bar chart 
presentaiion object. 

Presentation/Semaitttc Split A logical spht betw^een diytrib- 
nted objects, the presentation/semantic split provides an 
efficient architecture for dislributed applicalioiis. Local pre- 
sentation objects hancile the bulk of user jjiferaction, while a 
semantic object (which can t)e anj''^vhere on the network) 
hokb a shared persistent state of the object (see Fig. 4). 

By using the presentation/semantic split, the designer can 
choose what part of the application should be shaied and 
what should be unique to each tisei A[>pllcations that might 
use the presentation/seinanlic split iru lutie a team white- 
board where iill beliavinr is shiued hm each user can write 
conmientSj or a common document witli pages that stie 
unique to each user so that aU users can read at their own 
pace. A variety of sample applications included with HP 
Distributed Smalltalk provide illustrations of how to use the 
presentation/semantic split, 

Wliile use of the presentation/semantic split is optional, it 
facilitates and optimizes distributed application develop- 
ment and execution. Advantages of using the presentation/ 
semtmtic spht include: 

• Acceptable performance levels eveu over w ide area 
ne I works 

• Association of a single semax^tic object with multiple pre- 
sentation object St a critical feature in distributed computing 
envjronitients where it is common for many users to work 
with the same application 

• Apphcatioji access independent of local windowing systems 

• Better code reusability. 

The HP Software Solution Broker described on page 93 is a 
good example of using the presentation/semantic split m an 
apphcation. 

Developer Ser\4ces 

HP Disti'lbuted Smalltalk also extends Visual Worl^ with 
services that support development and test of distributed 
apphcations. 
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Fig- 5. The coiUfoi panel provides an easy-to-use interface to 

.ii hninistniuvf ami <!t>V(:^1()per seniees. 

Control Panel. The technical user interface to HP Distributed 
SnuiJIlalk for administrators aiid deveiopers Is itivaiuable for 
testing and maintenance (Hg, o). The control panel pn>\ides: 

• Controls to start and stop the system cleanly 

• Support for local RPC testing (simulated distribution) 



• TVacing facilities to log network conversations between 
objects 

• Performance monitoring. 

tifterface Repository Browser iod Editor. The interface reposi- 
tory browser provides an iconic view of the contents of the 

interface repositonj^ where publicly a^nilable interfaces ar^ 
spet^ified (see Pig. 0), It is organized hierarchically so that 
developers can explore and t^t interfaces and construct 
ret^uests !o tise ihv interfaces. 

Shared Inteiface Repo^itOiY. In HP Distributed Smalltalic. 
users can share an interface repository on a remote s^ystem 
so they do not have the overhead of keeping a copy of all of 
the interfaces on ever>^ sv'stcm. Tlte product also supports 
version management of interfaces, which is very important 
in large-scale, evolving distributer! systems. 

Remote Context Inspector and Debugger This service is an ex- 
tension that allows debugging on remote images when ap- 
propriate. It supports object inspection and debu^ng for 
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Fig, 7» Screensj eiysociated wiUi a 
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the entire distributed execution cont^ext, including commu- 
nication between images. Fig. 7 siiows using llie ilebiigger to 
sU^p through code and inspect objects thai niiglil l>e located 
anywhere in a distributed environment. 

Stripping Tool. To prepare an api)lJcaiiori lor delivery, devel- 
opers use the HP Distributed Smalltalk h tripping tool to re- 
move unneeded classes mid interfaces mid seal source code 
when apptication development is comj)lete. Tlie si ripping 
tools user interface suggests likely items ihr removal (see 
Fig. 8), 

User Services 

User services allow developers to build a desktop or office 
environment mid control activities during a session. 

System Objects. HP Distiibuted Smalltalk supports a vmiety 
of system objects: user, session, clipboard, wastebasket, and 
orphmiage. 
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• User object. This object contains information held about 
end users of the system including who tlicy me, how to con- 
lac I IheiJif and so on. User objects may be included by refer- 
ejice in other objects. For example, a aser might include a 
business card iri a memo that would enable the receiver to 
get in touch wiUi ihe scuder. 

• Session object. All Ihe iulormalion required about the state 
of a user's enviionmeni, including user login, preferences, 
layout, and so on are contained in a session object. Tlie ses- 
sion object also supports the notion of workspaces, with the 
potential for developing richer workspace environnieuis. l\ 
has no icon on the desktop but it interacts with and sup- 
puits other ajjplicalion objects. 

• C Upboiird, Tliis is a container for objects that are being cut, 
moved, or copied from one location to another 

• Wiistebasket, 11 lis container receives objects that users 
throw away The wastebaskel can t>e cleared wtien it gets 
too fuil. 

• Oqilimiage. This is a container for holding objects that are 
no longer needed. 

SecuritY. Developers can use or extend IIP Distributed 
S n I a 1 1 ! al k 's access c ontrol servi c es in the app I i cat io n s i hey 
buihJ, setting controls for ht)Sl sy.stenLs, users, or both. Host- 
system access control lets developers detennine whether m\ 
image can receive niessages from miother system. User-level 
access control lets a developer determine whether a given 
user has miy one of several kuids of privileges (e,g., read or 
write privilege) for a giv^n object. 

Developers can administer access control progrmnmaticaliy 
or from the default user interface. 

Example Code 

While all HP Distributed Smalltalk code is available to read, 
reiise, or extend, the default user interface and certain sam- 
ple applications may be the best place to start. 
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Fig, 9, The screen for presenting the office metaphor and some 
typical objects in ari office . 

Qser Interface, HP Dislnbtited Smalltalk uses aiid provides 
sii|.>pun for a user interface based on aii office metaphor 
which is designed (or easy use aiid undet^^landing. lit the 
default user inteiface, all the objects a user works with lo- 
ciilly (folders, file cabinets, docimieitts, and so on) are con- 
tained in an office. AH offices on tlte sanie system are in the 
same building, lasers can navigate bctivcen buildings to ac- 
cess objects in other offices. Fig. 9 shows a typicd office 
and some of ihe objecLs avmlable in an office. 

Sample Applications- Saniple applications iltustiate the use of 
distributed ol\jerts. Par example, the Fonim (Fig. 10) pro- 
vides a shaied window in which several users can view and 
annoiale a picriu'c^ or document. Tlie Notebook is a place to 
store both local and remote objects on a deskttjp. 

Users can also build tbeir own objects from any of flic sim- 
ple objects a^^lable, mcluditig a table, chart, input field, 



picture, and text window (s€*e Fig. 11). The sample applica- 
tions can be extended and customized to create a vajiety of 
simple distributed applications. 

Creating Applications 

HP DistjitHnetl Smallialk allows MsualWorks programmers 
lo create distributed applications qiiic^kly and easOy. Building 
on tlie l:>enefits of Smalltalk and \isualVVorks, HP Distributed 
Smalltalk users can build CORBA<;ompliant applications 
either from scratch or by modifjing exisfing appiications. 
Like any Smalltalk application, tbe distri billed developmeni 
process is iterative and designed for dynautic refinement. 

Development Distributed application development is a 
four-step process. 

L Design and test the application objects locally. 

2. Define the object interfaces and register them in the 

interface repository, 

3. Use HP Distributed SmaMtaik's sintulated remote testiitg 
tools (w^hich actually use the ORB to marsh all and unmar- 
shall object requests) to verify the interfaces specified in the 
interface repository. 

4. Track messages and tune perfoiinance. 

Distribution, Once an application is (ieveloi>ed. tested, and 
tuned locally, it is easy lo set it up for distributed use. 

5. Copy the application classes to the Smalltalk images Lhey 
will run on. 

6. Update the interface repositories in these images. 

Tlie application can I hen nm in the fully distributed environ- 
ineni wfithoul fiirlher clumge. Kxfejjl ttn art u;il packet trans- 
fen ihe fltstdbuted application is identical to the simidated 
remote application developetl, tuned, and tested during 
devel<ii>iMent. 
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Delivery. Once the applicatian is tested, developers can 
deliver it to their users by stripping the environment of 
imneeded objects and tools. Once stripped, the application 
looks exactly the same as applications developed in other 
languages ai^d caji be executed on arxy supported platform, 
inchiding: HP-UX,* Sun OS/Solar js, lE^M AIX. Microsoft® 
Windows, Microsoft Windows NT or IBM OS/2. Support for 
these platfonns is available under a nin-tiitie license from 
He wle tt-Packard . 
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with X/Operts* XPG4 PDSfX 1 Q03.1 . 1 D03.2, FJPS 1 5M . and SVIDZ Enterface specifications. 
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Fig. 11, Sample objecLs provided with HP Distrlbulfid Smalltaik, 



92 Apriligfifi HewIeitrFackaid Joumul 



)Copr. 1949-1998 Hewlett-Packard Co. 



A Software Solution Broker for 
Technical Consultants 



A distributed client-server system gives HP's worldwide technical 
consultants easy access to the latest HP and non-HP software products 
and toois for customer demonstrations and prototyping. 

by Manny Yoiisefi, Adel Ghoneimy, and Wulf Rehder 



On a typical working day aii HP consultant, one of thousands 

worldwide, sits do%\Ti with a customer to solve a business 
problem. Ilie challeiige, the rusiomer may tell the coiisultant, 
is to move sales data to head(|iiarters more quickly so that 
management can make timely strategic decisions. For a 
solution, the consultant might propose a decision support 
system that mtegrates the customer s older legacy system 
where the sales data has been stored traditionally with a 
faster "warehouse'' database and easy access tools that 
present the infonnation in just the fonn needed^ right on tlje 
customer's desktop. "Let me show you what I mean," ti^e 
consultant says, tnming on a lai>top computer (which had 
pre\1oiisIy been connected to a LAN or telephone socket). 
Naiiigatmg through the windows on the screen, the consul- 
tant invites the customer to look through a virtual shelf 
filled with databases and access tools, all represented by 
icons, togeltier with middleware and application develop- 
ment toolkits (see page 98). The consuitiint clicks on an 
icon and the tool becomes innnedlately available for brows- 
ing or for self-|jaced learning. From fiere the consultant may 
show one of the demos thai are included, or navigate the 
customer through a liyper1c\T drHiiiiitmt to more infonnation, 
tilteniate products, atiditional oj>tions. and t>refabricated 
software building blocks. No wonder tliat tliis virtual soft- 
wmp laboratory is called by HP consultants, "tlie softw^are 
sandbox.*' This cousult^mt Isaclually building — from rhe 
(ool anfi produt^t txjrt folio in front of them — a t)rolotyt>ical 
decision support system for tiijs customer How niuch of 
this is fantasy and how much reality? 

The answer is that it is aU reality now. The software sandbox 
Ihat the consnltanl was st^trting to "play in"" is called the HP 
Soflwjire SoUitiou l^roker (or Broker, f^ir short) and is avail- 
able now to HP eon^^uUatus. Defuung and creating a deci- 
sion suppoii system is, of cofirse. not j.>lay but serious work. 
Howeven the ease and immediacy of the Broker, the ample 
choices, and many helpful Innt-s make even urgent business 
pr<jblem sohing an experimental sport. Best of all ibe con- 
sultant receives these products and tools, logetber witli sup- 
port and on-line documentation, free of charge. For this con- 
venience, substantial research efforts had to be poured into 
building such a vinurd software depot, using HP's own hard- 
ware pltUforui cUid the most advEmced object technologj^. 
Before ex| staining Uiis implementation more systematically^ it 
is useful to walch our technical consultant and the customer 
at work. 



Usl^ the Software Solution Broker 

Tb get a feeling for iiow^ the Softw are Solution Broker is used 
we \%iU briefly watch tlie technical consultiUYt show the cus- 
tomer how to build a prototypical decision support, system. 

After clicking on the icon in the ORB control panel, w hich 
starts the object request broker (an action that in effect 
opens the lid co%'ering the sajKlbox), the corisultant activates 
tiie Software Solution Broker icon. .Another window^ opens 
offering the Broker's classittcalion of products, either by 
vendor, by tecbnolog>\ or by product name. (Alteniativc paths 
into rhe Softw^are Solution Broker, such as a classification 
by business problem, are under development.) Choosing the 
i(n formation request) button for technolog>^, the considtant 
asks whether the customer w^anis to see database infomia- 
t ion first or options for the user interface. As an executive, 
the customer is eager to see or build a nice GUI, Chckuig 
on the graphic user interface i button brings up several 
choic^es of which three are shfjwn in Fig. I. Having heard 
about Visual Works the customer selects it and is presented 
with the Visual Works Showcase. 

Tht> consultaiU then show-s a Visual Works deru oust ration to 
explore with the custotner what kitKi of flat a display win- 
dows, contrtil buttons, mtalysLs tr>ols, and other features 
would be appropriate. Mer jotting down these initial re- 
quirements the cousultajtt is ready to build a fu-st prototype. 
'Hie help button launches a palette of GUI building tools, 
and It takes only minutes to draw ati example of a transac- 
tion entry^ tool for the trajisactions underlying the decision 
sup])ort system the customer wmUs built (see Fig. 2). Here 
the f^tistomer iiuerrupls and requests that the data be shown 
in spreadsheet form as well as graphically. They agree on 
bar chan and pie chart presentations for a first cut and pro- 
ceed to disruss the nHjuirements for the underlying data- 
base. The Software Solutit>n Broker has a "viitual sheir of 
relational databases that work with VisualWorks, and ^mtong 
these tl\e customer may have a favoriie system, or an al- 
reaciy installed legacy ciatabase. They again discuss the pros 
and cons wliile viewing vailous product demonstrations. 

We meet the customer and the consultant again after an- 
oiher hoiu^ cjf so. By tiien the Visual Works front -end tool 
displays sotite real data pulled from a database (Fig. 3). At 
this point we leave tbt^ executive's office atid describe how 
the Software wSolution Broker is constructed 
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Fig. 1. Sfjftware Solution BrrAer mer inleifac e. 



Constructing the Software Solution Broker 

Two considerations delerniined the architect lire and conse- 
quently the implementation tif the SofTwaie Solution Broker. 
First, sjru'e the products on Hie Br(>ker have to be accessible 
worldwide but \\t11 be updated di\d maintained locallyn tlie 
global partitioning between disiributed users and a central 
sen er fiiiiciionality called for a rlient/ser\^er irnplenientis- 
tion on a wkle area network (WAN ). Secondly, the need to 
accommodate ni;my different types of clients and to be able 
to encapsulate nt£iny different products in the software 
server strongly suggested vendor inciependence (openness) 
and adherence to ceH^in industry su±jidaids such as tJie 
Common 01>ject Request Broker Ait hit ecture (COEBA). 

Software Subs^trate 

Here we w ill not focus on the WAN unplementation bitt 
instead will concentrate on the softi^are substrate on which 
the Software Solution Broker is built, hi tlte softw^are sub- 
strate (see Fig. 4) we include the entire software kit com- 
posed of server and chent development tools, tools for 
building the client^server iutei^action components of the sys- 
tem, and repositoiy tools. Repository^ tools aie essential for 
tiie constniction of a depot thai contains the ui formation in 
the system, including the logic for accessing diis infonnation. 
After a careful technical analysis of five alternative complete 



substrate kits, VisnalWorks fi^om Pare Place Systems was cho- 
sen as the development software for the PC, IINLX'^' client, 
and ITsnX seivei; while HP's Distributee^ Smalltalk (see ar- 
ticle, t^^igf' ^'^}i which also works with VisnalWorks, was the 
tool of choice to build luid manage tiie cBent/serv^er mterac- 
tion. All system mfomiation (e.g., documentation) at this 
time of vmting (release 2.0) still resides with the i>roducts 
and a central repository lias not yet been chosen. Tools such 
as Object Lens (working wilb VisualWorks) or HP ()dapter 
make lelational databases look like object databases, so we 
know that the selection of a repository^ can be made very 
quickly when needed. 

VisualWorks was the easy wimier because it provides a com- 
plete enrollment for the development of true graphic appli- 
cations tliat nm michanged on UNLX-sy^stenvljased. PC, and 
Macintosh computers under tbeu- native windowing systems. 
Tliree of \1sual Works' features made it especially appropriate 
for the Software Solution Broker: 

m VisualWorks is built on Smalltalk, a pure object-oriented 
lajiguage designed for fast modular design. 

• VisualWorks possesses a tested set of tievelopment tools, 
includuig In rn\ set's for object classes* a fhread-safe debug- 
ger, and a change manager lo track modifications to the 
cock^^ as well as an inspector for use m testing. 



94 April 1^5 Ilewltni-pHckard .lotimat 



)Copr. 1949-1998 Hewlett-Packard Co. 



Rte Browse Tools Ctrangies Databa^ ^Mndow 






Model Enmy View He^ 



Data Modet: d^ 




Pig. 2. A window within the Software Scjlution Ilrfjker sliouing VisualWorks tools for prijtotypin^ aftustomer application. 



• VisualWorks luis a large chiss libnxr>' nf imue thrui ;i?50 t^pes 
of portiibk^ ohjt'ctii. Thest^ iiu.-Uide aricli UJ^tT inl.frfaff de- 
v('lt.>j)nieni toolkit suitat)lc Tor all nv^or winclowmg systems. 

HP Dist rib tiled Smalltalk extends Visual Works* capability 
for developing stantl alone sy stems into an environment for 
creating distributed object systems {see Mg. 5) by adding 
the following: 

• A ftili irti|jlementation of the Object Management Group 
(OMG) C^ommon Object Request Broker Architecture 
(CJORBA) core sendees 

• t'onimon Object Services for life cycle operations such as 
creating objerls and the relationships between Ihetn 

• Sami>le aijplication objec^ts, for exaitiple for the modular 
partitioning of clienl/ser^er fujictionaJity Into semantic and 
presentation objects. 

These objects and services for btiilriing distributed applica- 
tions are portable to all platfomis suppoited by \^isual\\Tjrks. 
Ftirlhermoie, they are compatible ^^ith the OMCJ CORBA 
standaids. HVh Distributed Sniallttilk provides seamless 
snppoii of client/server interactions between VisualWorks 

■ OSF DCE js the Open Software FDond axon's DfStribuied Computing Environment 



images. CORBA compliance nuikes orir Software Solution 
Broker itnpleineiU at ioti opeti ami ra|>abU' of ititernj Mutability, 
for instance with 0+ CORBA-coiiifiiiant afipbrations, and 
as soon iis IIP Distributed Snuilltalk is t )SF DC'E-com pliant,* 
also with DCE remote procedure calls (EPCs). For the cur- 
rent release, TCP/IP or HP Socket.s are being used. 

Product Encapsulation 

Everybody wluj has worker! with spreadsheets, word pro- 
cessors, or CAD systems kiKJWs that sUiiilar or identical 
functional iiy fkies not mean that the tiser interfaces and 
more generally t be vlsnal, icoiiic, aiifi ntental models ai'e 
comparable. For \he Software Solution Broker, too, each 
pnxluct has its owii artifacts and idiosj^Ticrasies, its own 
look an<l [jersonaiity by wliich we cim identify it when we 
see it in use or on the shelf of a vendor. This unavoidable 
fad poses clialletiges for the "^viilual shc^r of llie Broker 
Without w^futtlng to blot out the individuality of a vendor's 
offering if was tJie objective of die development leaiu to 
jnininii/r the effort needed for the user to gel accustonic^d to 
tJiLs diversity. (Generally speakiiij^j tlie variety has in he hidden 
behind a simjile and consistent, product iridepetulenl mode 
of access witli mtiform and intititive gratjliical symbolism. A 
pari ieulai- example is the double click used coi^sistently to 
launch an application. 
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Fiif, 3, Prototype display for a customer applicalioii consLrucied usin^ the Softwarf Si jluT.ion Brok^^r lo seietL user interface and dstabase 



Encapsulation^ in the context of the Softwai-e Solution 3ro- 
kpr, describes a body oractivilies and software mechanisms 
that have two pui-|ioses: lo hUegrale eijcli producl within the 
oveiiill product poitfolio so that I tie ct>i>siillaiit c;m use it in 
its native mode, and to provide a uiiifomi %vay lo access the 
products, their associated tools, and other artifacts. This 
accessibility it should be noted, is resfrlrtetl to the features 
and artifac:ts that are relevant to consulting work wiih llie 



customer. This means the consultant can access editors^ 
executat>le code, antl documentation, but isn*t able to change 
the uitemal product confi^iuation, the way it Ls stored and 
administered in folders, or the source code. Because of the 
indinsic syn^niet.r>' between Software Solution Broker serv- 
ers and clients fsee Fig> 4) the encapsalation am be done 
either on \he sen'er side or on the client side, pro\ided the 
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Fig* 4, Software Solution Broker 
software substrate, showing the 
client/server archltr^eture, the 
user interface engine (Visiial- 
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classes FolderPlusPO aiid EncapsuiationDialog are present in the 
client. Theise tw^o classes will be discussed below. 

It is the already mentioned semantic/presenration split, to 
gether with object-oriented features surh as hiheiitance and 
polymorphism that make the encapsulation effortless. The 
semantic^reseiitation object distribution model is HP Dis- 
tributed Smalltalk's buplementation of a distributed client/ 
ser%'er architecture. In this model classes always appear in 
logical pairs, one representing the serv-er sernjmticst the 
otfuT t[ieir presentation in the client. C'onseciuenlly. I he class 
Instances or object.s also come in pairs. Take ffjr instance 
the window object. Every window is composed of tw^o log- 
ical parts: Its shared (semantic) properties stich as its rect- 
aiigulai- shai>e, oJid its Icjcal and pei^onal ( jiresentation) at- 
tributes such as colon In general, a semantic object often 
has (and controls) many different presentation objects, 
which in the case of the Sofi:ware Solution Broker handle 
the remote user interactions, thus reducing network trafTlc. 
For instance, one seniajitic data display object creates tuid 
controls different presentations of t lie data as a bar chart 
arid a pie chsul in a decision support system. UP Uistiibuted 
Smalltalk allows various modes of collaboration between 
the semaotic and presfntafion objects, including messages 
that arc harulh^d hy lire object request broker. (For a simple 
but complete example see the HP DiMnbuted SmafUalk 
User's Guide, chapter 10.) 

After this abstract introduction of HP Distributed Smalltalk s 
semantic/presentation split arcMtecture we will describe in 
more concrete terms how it works for the encapsulation 
procedure. As stated above, encapstilation must achieve two 
goals: it has to jiresent a graphical representation of tho arti- 
fact (product, tools, demos, documentation) in its native 
mode to the remote ctietit, and it must allow the remote user 
to launch I lie <inifact at the server sifle itirotigb this repre- 
sentatinrt HP Distributed Smalltalk has a pair <jf classes^ 
MediaSO and Media PO, that accomplish exactly this, (The stif- 
fixes SO and PC imply that semantic and presentation ob- 
jecMs. rfistH*ctively, me spawned by these classes). Tracing 
the interaction diagrtini between two ol>jerTs of itiese 
classes we found that there exists a ready-inufit* metht>d 
caJkd updatePresenter, visible in the MediaSO class, that creates 
the remote presentatiim object r>f a producl or ol her artifact 
ill Uie server. To customize tlte generic MediaSO iind Media PO 



classes and the method updatePresenter for the encapsulation 
of speciftc artifacts we first created ttte Ocurower subclasses 
ArtifactSO and ArtifactPO. Then we augmented Artifacts with 
the attributes of artifacts such as vendor antl product 
names. Finally, usmg overloading, we extentled the method 
updatePresenter to include, among several other admin istra- 
ti\^e tasks, the crucial behavior required for launclung the 
artifacts while exporting their tli splay to the client platfonn. 

Concurrent with this architectural design of the classes and 
methods that bring about encapsulation in the Software 
Solution Broker, a few product dependent steps must also 
be taken. This is done at the instance or object level of eveiy 
concrete artifact (such as a product) so that it will behave in 
its exfj<*eted, native mode. This is a simple matter of insert- 
ing die right emironment variables and ijaranteters in an 
encapsulatit>n dialog wintiow. The reqtiirefl information can 
easily be gleaned frfjrt) the installation manual of the particu- 
lar product that is being encapsulated. Finally, products, 
tools, and other comjjonents are put into folders and the 
encapstilation Is done. 

Use of Object Technology 

The design iuid building of the Soflwaiv Solution Broker 
were characterized by a shtjrt development time, a minimal 
anunmt of new c^odingj and a iugli flegree of reuse. Tlie tn^jor 
i^ason is the af^jplicatlori of object-oriented technolog>^ The 
object-oriented use is pervasive throughout the design, as 
mdicateti above, titit it is heljjfol to [loinr to specific exam- 
ples. We'll gi\ e two examples for the olaject-oriented fea- 
tures Inhentance and polymorphism in the context of 
encapsulation. 

("}ne of the examples has jtist been described: the subclass 
ArtifactSO of the class MediaSO inherited the method update- 
Presenter, which in tuin, through the featttre of polymorphism, 
was overloadtKl (tliat is, extendtnl tti include additional 
fimctionai behavior). 

Tlie enca|>sulation dialog window provides another example. 
As an admir^strative tool, it is not available^ to the user ft is 
im objei t built from a subclass tjf the existing HP Distril>uted 
Smalliaik class calltnl Simple Dialog. From this class, the win- 
dow Inherits cliaracierisUcs such as its i>roperty tfj pop up 
in front of other vrindows Cit*s not obscured), im basic layout 
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HP Software Solution Broker Accessible Products 



Vendors 

• Cognos Corp. 

• ParcPlace System 

• XW Software 

• Jtasca System 

• Informix 

• Neuron Data 

• Sybase 

• LEnison Software 

• ProtoSoft 

• Oracle 

• Dynasty 

• NetLabs 

Tools 

HP-UX* 

• Cognos Corp: 

o PowerHouse 4GL 7.23 

• Pa replace System 
o VJsualWorks ?.0 

o Visual Works with Sybase connectivity 

• XVT Software 

o XVT-Design [C Developer Kftsj 
o XVT/XM (C Developer Kits} 
o XVT-Power+4 

o XVT/XM (C +4^ Developer Kits] 
o XVT-PowerObject Pak I 

• Itasca System 

o ODBMS Server 

o Developer Tool Suite 

o C Interface 

o Lisp Interface 

o API Lrbranes [C++, CLOS. Ada} 

• Informix 

o Informix Onlirje R4GL 

D tnf[)rmiKWingZ 

o tnformiK SE R4GL 

o Informix SE ISQL 

o Informix Hyperscnpt Tools 

Q Informix Online ISQL 

• Neuron Da la 

o Smart Elements (M expert objecti 
o Smart Elements (Openedit} 
o Open Interface Elements (Open edit) 
o CS Elements iOpeneditl 

• Sybase 

o SA Companion (client S. server] 
o SQL Mpnitor (client S serverl 
SQL Debugger inspector 
o SQL Debugger console 
o SQL Data WorkJsencfi 
o SQL APT Edit 
oSQRWorl^benchlEaiySQR) 
o Open Client/SErver 
oiSQL/SQL Server 

• Unison Software 
Q Maestro 

o Load Balancer 
o Expr&ss 
o RoadRunner 

• ProtoSoft 

o Paradigm Plus 

• Oracle 

• NetLabs 

o Net Labs/ Asset Manager 



o NetLabs/Vrsfon 

o NetLabs/ Assist 

•0 NetLabs/NerveCenter 

o NetLabs/Manager 

o NetLabs/OverLord Manager 

^ NetLabs/Discovery 

MS Windows 

• Cognos Corp. 

o PowerHouse Windows 1 ZE 
© Axiant 

Impromptu 
o PowerPlay 

• ParcPlace System 
c VisualWorks 2,0 

o VisualWorks with Sybase conriectivity 

• XVT Software 

XVT'Destgn (C Developer Kits) 

o XVT/Win (C Developer Kits) 

o XVT-Power++ 

o XVT/Win [C ++ Developer Kits) 

o XVT-PowerDbject Pak I for MS Windows 

• Itasca System 
oDDBMS Server 

o API Libranes (C++) 

• Informix 

o New Era 

• Neuron Data 

Q Smart Elements fNej^pert abject) 
o Smart BementslOpenedit} 
o Open Interface Elements (Open eda) 
o CS Elements (Openedit) 

• Sybase 

o Met'Library 
o Open Client /C 
o SQL Monitor Client 
o SQR Workbench 
o APT Execute 

• ProtaSaft 

o Paradigm Plus 

• Oracle 

• Dynasty Technologies 
o Dynasty 

• NetLabs 

o NetLabs/Vision DeskTop 

Artifacts 

• Cognos Corporation 

o QUICK Application 

o QU[ZAppfi cation 

o PDL Apphcation 

o QDESIGN Appljcaiion 

o QTP Application 

o QUTIL Application 

o PDL And Utilities Reference Msnyal 

o PowerHouse for UNIX - Primer 

• ParcPlace System 

o Product Overview 

• XVT Software 

o Product Overview 
o XVT Design Tutorial 
o XVT Database Demo 
o xVT-PDwer++ Overview 
o XVT-PDwer++ Demo Gufde 
Q XVT Power ++ Earth Demo 

• Itasca System 



• Informix 

o Product Overview 

o Informix R4GL Demu 

o Six demos with source cades 

o Informix ISQL Demo 

o Infomiix Hyperscript Demo 

• Neuron Data 

o Product Dyerv^ew 

o Notepad Widget example with source files 

o Pack example with source files 

o Print widget example wiifi source fi les 

o Resize widget example witti source fries 

o Resource Picker example with source files 

o Scripting example with source files 

o Scroll area usage example with source fil6S 

o Scrull bar usage with source files 

o Sliders usage with source files 

o Special widget example with source files 

o String search example with source fifes 

o Text edit validation example with source files 

o Windows MD example with source files 

o Alert Windows example with source files 

o BrowSBx example with source files 

o Brows I nc example with source files 

o Cbox example with source fries 

o Chart example with source files 

o Clock Widget example with source files 

o C++ Natepad widget example with source files 

o Drag drop example with source fifes 

o Draw example with source files 

o Drop Down pale example with source files 

o File manager example with source files 

o File name translator example with source files 

o File Picker example with source files 

o Floating window example with soorce files 

o Cantt chart exampfe with source fifes 

o Help engine example with source files 

o Help viewer example with source files 

o ICON generator example with source files 

o List Box example wiifi source files 

o Local drag drop example with source files 

o Menu example with source files 

o Multipte font text example and source code: 

o Notepad example with source files 

• Sybase 

o SyBooks 

o APT Demo 

o Compute example with source fifes 

o Csr_disp example with source files 

o H 8n example with source fifes 

o biktxt example with source fifes 

D Five other examples with source fries 

• Unison Software 

• ProtoSoft 

• Oracfe 

• Dynasty 

• NetLabs 

HP-UK is based □" ar^d is compatJWe with Mov&li'stJNfX''' 
Qperaiing system. It a.hu camplies with X/Open's* XPG4. 
POSIX }003,T, 1003,2, FlPS 151-1. and SVID2 interfaEB 
spacifica lions, 

UN EX is 3 registered trademark in the United States and etiiBr 
coufim&s. ficertsed exclusiviiy ttirough X/Gpen Comparty 
Umited, 

X/Open is a trademark nf K/Of«n Companv Lirnited In the UK 

and other coiifttries 

MS Windows is a U.S. trai^markof Microsoft CorporBtion. 
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with a text box and O.K. and cancel buttons, and its link to a 
value holder that holds the environmerual \anai)les, names, 
and other information needed for tiie encapsulation. The 
only method needed in addition to inherited ones is the one 
requesting tlie encapsulation parameters mentioned above. 

The same procedure, that is, the use of predefined classes 
and thus niinimai coding. applie-S to HP Disrnt>uted Small- 
talk s folders containing fhe enca|>sulatefl product with its 
tools and other anifacts. The HP Distributed Smalltalk class 
FalderPO (PO indicates it is the folder class pawning presen- 
tation objects) has a method windowMeny, which creates a 
window with several pop-up menus that have labels such 
as Action, EdiL and so on. For a subclass of FoiderPO called 
FolderPlusPO, tliese properties of windowMenu are iitlteritect, but 
wJndowMefiu is also chiuiged (wivile keeping the same name), 
by the addition of a method artifactCreate aitd its label in one 
of tile pop-up menus of windowMenu. The jueiliod artifactCreate 
IS responsible for the Inner workings of the encai^sulation 
dialog window mentioned above. 

Development Methodology 

P'uiicting for Hie Software Solution Broker project was subject 
to the condition that the development team fmd, justify, and 
implement a design tiiat brings the tools to the consultants in 
the fastest jjossible way witli the least ainoimt of resources, 
u\cluding development, maintenance, and support resources. 
At the same time, every released version, even the verj' first 
one, had to find Lmniediate user acceptimce. Biised on these 
stipulations the team chose a developnieni met lit )d I hat is a 
hybrid of iterative protoh/ping tUXil the Fffsiuit hhHUjhL 

Our reasons for favoring iterative prototyping over a classi- 
cal softwaie design paradigm that starts with a conii)lete 
specification (such as the so-called waterfall model) were: 

• Time constraints. There are never enough engineer-months 
to write a complete specification, mtplenient and test il into 
production strength J 

• ('onstraints imposed by tlic intrinsic tialttre of the Soft\^'are 
Solution Broker tool we were building, that Is: 

o Client-side usability. The GUI that was eventually chosen 
is the result of repeated testing hy iioientiaJ users to 
achieve nuiximum ease of use and inUiiliveoess, and this 
amount of trial -and-error cannot be si>eeified in advance. 

Tool accessibility. The different i>roducts on Hie virtual 
shelf have different l^ehaviors and their own refjuirements 
for resources arul administration, and creating the ent*ap- 
sulation process agidti requires much experimentation and 
gradual mattiration based on experience that cannot be 
specified a priori. 

o Using Oie oi)ject pjaradigm. The software substrate chosen 
(IIP Distributed Smalltalk with VisualWorks) is well-suited 
for the rapid development of GUI and c^bent/ser%^er 
applicatioJis. 

Based on these considerations, our overall approach was 
that of evolutionary t^r^jtotyping, in whicJi a fully functional 
prototype is ushered tlirou^l\ repeated refinement steps into 
a fu-^)duction-strength emi [iroducb We realize tliat often a 
prototype leads only to jiji executalile siieciBcation or a vali- 
dated mmiel, not a higlHjuiiJity, stable product. However, in 
oiu- case the sopliisticated framework of PIP Distributed 



Smalltalk with its semantics^presentaiion sijlit and Visnal- 

Works with its Model \1ew Controller easured full function- 
alit>^ and liigh quality at each refinement .step becaiise we 
reused the existing, high-quality code (including the library 
of classes) and very sparingh' added new, thoroughly tested 
code, preferably as instances (objects) of ihe esdsting class 
library. 

Fusion Method 

ttliile iterative prototyping can be seen as a software dex^el- 
opmeni plulosopliy that is primarily diciaied by business 
rerjuirements such as time to market* break-even time, or 
optimal retm^ on inveslnient, the Fusion niediod^ was de- 
veloped with tlie goal of creating a laitguage independent, 
compreJiensive, software project mai^agement method. 
Being a systematic object-oriented development method, it 
blends weO with our software substrate, wliich we chose 
based on openness, compliat^ce %\1th industry standards, 
ease of use. and the ability to sepaiate the seiver (seman- 
tics) froru llie remote clients (presentation). The Fusion 
nTethtj(i emphasizes a modidar design process in clearly de- 
marcated phases, so it synchronizes well with the iterative 
prototyping approach, which requires the repetition and 
refinement of certain development stages without impacting 
otlrei-s. Furthermore, the Fusion method insists that a soft- 
ware development pro<'ess of the complexit>^ encountered 
today must cover the entire software development life cycle. 
Tlie Fusioti method's phased development process served as 
the blueprint for the vSoftwai'e Solution Broker. It can be 
summarized as follows^ (our italics): 

Starting front a requirements document, the analijsis phase 
produces a set of models tliat provide a dechu*at ive descrip- 
fion of the required system behavior. The analysis ntodels 
]jro%ide high-level constraints from winch the design models 
are developed. The design phase produces a set of models 
that realize the system behavior as a collection of interdicting 
objects. The implementation phase shows hcjw to majj tlte 
design models onto implementationhuiguage cottstjitcts. 

In our hybrid approach we take an early, loosely defined 
fimctionai prototyije as our initial requirements definitioti 
(an executable specification), to l?e modified and refined in 
subsequent itenitions through tlie three phases of analysis, 
design, and implenventarion. After each of these pliases a 
nniew of tlie pliase outputs is conflucted l>y the develop- 
uient team in coi\jimcfion witli users. Tlte results of this 
audit are prioritized and, if deemed imporixmt, irtcorijorated 
into the prototype which, through several of such review 
loops, evolves after a full cycle into the production product. 
(For details about the outputs mentioned ami the ctjniplete 
Fusion process breakdown see reference 2, t^pecially 
Appendix A.) 

bi smnniarj^; the two complementary methods of itemtive 
prototyping and Fusion serve two main purposes. First, at 
the end of each prototyi^ing cycle a ftilly funt^tional produc- 
tion-strength product is released. Serond. the fiu-ce Fusion 
phases — analysis, design, mid ijup lamentation — of ever>^ 
cycle are ijidependent of t ht* jihases in aiujt her cycle. There- 
fore, we are in effect working towards several releases at 
the same time (see Fig, 6). 
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Cycle 2 



Sy^em Model 1 

^ identify Product Sei 1 

Identify Classes 1 



Design Produci Model 1 



System M0dd 2 



Latip 



Identify Product Sol 2 
ti^efitifv Classes 2 
— Review 



Design Praduct Model 2 



Design Eecapselalion 
Classes i 



Encapsulate Pmdtict Set I 
f IntegraielFoldersI) 



Design Encapsulation 
Classes 2 



Loop 



Review 



Encapsulate Product Set 2 
Integrate (Foyers 2) 
Test 
— Review 



Loop 



Release 

Fig. 6. LSof(:ware Solution Broker development, used iterative proto- 
tyT3iitg and the Fusion nietliod, resulting in pai^allel development 
cycles. 

Cnstomiziiig the Software Solution Broker 

In atklilioii to bemi^ a productivily tool aiitl a hub of product 
expertise for HP's technical consultants^ the Broker can be 
customized to meet the business needs of end customers as 
M^eU. To sketch how such a custotulzation can be done using 
the object-oriented framework of HP Distributed Smalltalk, 
imagine a vendor of CAD (computer-aided design) software. 
Rather than offering shrink-WTapped software packages on 
the shelves of the store the retailer W' ants to offer customers 
an en\ironnient where they can, by navigating through viv- 
tual shelves, choose interesting products and "test diive'' 
them in the store before deciding what to buy. 

For an end customer such as the CAD software vendor, the 
Broker can be customized by mapping the parti euitir cus- 
tojner requirements into several levels of design complexity 
These levels describe in technical tenns what level of inter- 
vention into the framew^ork of HP Distributed Smalltalk is 
needed to alter and customize the existing classes and meth- 
ods. On the low^est level, the requirements fit the HP Distrib- 
uted Smalltalk framework exactly^ and the system can be 
built from existing chisses without change. A higher level of 
intervention would be needetl to construct die Software 
Solution Broker for the CAD softw^are vendor, Shglit modifi- 
cations of core sendees (relathig to contaimnent imd life 
cycle semantics), in addition lo class augmental ioti and 
overloading of methods, wouJd be reconurtended. W<.irking 
with predefined j w^ell-documented levels of interventjon that 
are necessar^^ to meet a customer's requirements has the 
advantage of comtnunicaiing to the customer in advance, 
during die analysis and before system design begins, how 
much reuse of the fi^amework is possible, and how much 
nonfiamew ork augmentation is necessary Intervention lev- 
els are thus not only teclinical assessments but also indica- 
tors of the final costs for the system. 



CoiLclusion 

The Softwaie Solution Broker was not a typical client/sender 
application development pixiject. We were not primarily con- 
cerned about two-tier or three-tier architectures, about ob- 
ject.s per se, abt^uf the one "right" prognunniing language, or 
about coding. In fact, we w ent the opposite route. Based on 
tlie working requirements of HP's technical consultants and 
our owiT analysis of how consultants w^ork with customers, 
we resolved to translate Uiese requirenients into a system 
built from distribtited objects. The building, how- ever, con- 
sisted mainly in the skillful choice of existing chisses and 
tlie exploitation oi' HP Distributed Smalltalk's framework. 
Tlie novelty in our approach lies not m the coding of new^ 
structures, but in the extensive application of reuse, hi fact, 
wiienever new code seemed required, we took it as a warning 
that further analysis w^as needed to look for prefabricated 
code witlun the framew^ork of HP Distributed Smalltalk. 
This simple principle, essential for a fast time to nuirket, 
also guaianteed a short tumaroimd time and high quality 

Through its first two releases, 1.0 and 2.0, the Software Solu- 
t,i{.)n Broker can be viewed as a chstributed productivity tool 
olTering t.liree overlat>piiig types of sendees. These three 
tyijes can be described metajijhorieally £is a \drtual software 
shop for the display of individual products, a consultative 
workbench or smiulated classroom for studying mid experi- 
menting with several collaborating products, and a vutual 
demo center with remote satellite offices where the techni- 
cal consultants can build protot,yi>es tuid create demos for a 
customer. Looked at from a broader pei^pect I ve, ho%vever, 
the Software Solutiot; Brrjker architecture iukI implementa- 
tion are, with small customizatioiT, also ideal for other, re^ 
laled applications that require otte (or a few) persistent cen- 
ters and many locally dLstributed and individually presented 
clients. One example is softw^are distiibution. Another is the 
establishment of a worldwide sofiwaie application develop- 
ment lab w^here each satellite group can develop its owti 
pait locally, check it tn with a central repositor>^ where it is 
available to the other satellites, and participate remotely in 
the integration of the pjirts into a system. Punhermore. ob- 
ject technologj^', with its concept of containers, makes avail- 
able compoimd documents (text., picture, voice, video, etc) 
that can be employed also on the nontechnical side of busi- 
ness as vehicles for elaborate project proposals and otlier 
comniimication witli business customers — for instance, to 
propose a solution by showing a video of a prior, successful 
installation (this would take the place of a paper tlocuinent 
of reference sites). In this role, the Software Solution Broker 
can be a woridavide Imsiness solulkms exh ibit and a conv^e- 
nient repositOT>^ for a i>ortfolio of repeatable solutions from 
which the customer, advised by a consultant, can select 
products the way we now choose from mail-order catalogs. 

Acfcn owl edgm ents 

A project suniv es by ttie benevolent patience of its sponsors, 
tlie enthusiasm of the project team, and the acceptance of the 
receiving customers, hi the case of the Software Solution 
Broker v\ eVe tried to justify tlhs benevolence Ijy turning our 
reseaix'h ideas into a useful tool in the shortest possible time. 
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This feat was only possible with the support of our managers References 

David Kirkland and Sherrj' Han^ey. Special thanks are due L EP. Brooks. IIhf Mythical Man-Month: Essays in Software Eitgi- 

Chu Chang, general manager of the Profe^ional Seivices tipt-ting, Vourdon F^^ess. Engjewood CMs, 1982, 

Division, for his encouragentenL We are also very grateful ^- D. Colem^, et si, Ohjwt-Onenied Ikveiopment^Tlie Fusion 

for the contributions and for the healed (but objeiit4\-e) dis- MethmL Prenuce Hali 1994, 

cussions ^Ith fiiends and partners from many HP s entities. y^ix js a regrsisfej if^temark m th^Urifed $ts^ gna s^ Ltj-jn-^r 'x^fwd ^ 
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consultaniB in the field. The Softi;i'are Solution Broker is 
dedicated to them. 
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Bugs in Black and White: Imaging IC 
Logic Levels with Voltage Contrast 

Voltage contrast imaging allows visual tracking of logical level problems 
to their source on operating integrated circuits, using a scanning electron 
microscope. This paper presents an overview of voltage contrast and the 

methods developed to image the failure of dynamic circuits in the 
floating-point coprocessor circuitry of the HP PA 71D0LC processor chip. 

by Jack D. Benzel 



As pressure foi' higher perfbrniance and higher iniegration 
drives integrated circuit design towards inereasiiig coniplex- 
ity, IC designers need aii ever-broadening set of aitalysis aiid 
debugging tools aiid met ho<iologies for tracking down func- 
tional bugs and electrical margin issues \n their designs. 

In developing the new HP FA 7100LC PA-RISC micropro- 
cessor chip, the Hoating-poini ariilnnetic logic unit (FFALUJ 
megacell used design techniques basec! on ihe PA 7100 de- 
sign.* Tlic FPALl' design is implemented with mostjy 
mousetrap-style dynamic logic^ with significant use of 
single-ended dynamic logic in the last pipeline stage. 

Past ex|Derience in debugging electrical prt^blems In nionse- 
trap designs has showTi these problems to be ver^' diftlcnlt 
to find.'^ A failme mechanism that emerged m prototyi>es of 
gate-biased PA 7100LC FPALUs proved higlily challenguig 
and evasive anti required a large euguieeiing effort to get 
from detection to the root eause identification. The voltage 
contrast imaging meThodolog^v laoved useliil in analyzing 
and later confiiining tlie root cause of the failme mecha- 
nism. Results from the analysis allow ed us to correct the 
design anti verif>' its quality. 

The Wall 

Tl^e FPALU failure mechanism was named "the w^all" because 
of its appeal ance on a freqnency-versns-volt^'ige shmoo plot 
depicting regions of passuig and failin g vectors (see Fig. 1). 

Considerable engineering resoiurces were applied toward 
finding the root cause of the wall using many of the tech- 
niques tJiat had proved successful on i>revioas design 
projects, including but not limited lo shmoo plots, failing 
vector/opcode analysis, clock phase stretching, focused Ion 
beam tFlB) experiments, and simulations of probable circuit 
failures.-^ These teclmiqttes were not providing enougli mfor- 
mation, and a new methodology was clearly needed- 

Why Voltage Contrast? 

Another HP design team had recently had success m using 
an electron-beam prober^ to track do^^^n the root cause of a 
noise problem on the same CPU cliip* 

Prexious experience with another project several years ago 
provided insights into a methodology similai' to electron- 
beam probing called voltage contrast, using a scanning 



electron microscope (SEM). After considermg the various 
trade<jffs it was tiecxded to proceed with the voltage contrast 
imaghig while keeping o]>en the option of going to electron- 
beam probing if further analysis was required. 

SEM Fundamentals 

llie SEM displays objects by sensing and imaging the release 
of secoiulary elect rr^ns l>'om (he svu'face of a sample wliich 
is held in a veiy high vacuum. A finely focused beam of elec- 
trons accelerated from an electron gun with a thorLsand-volt 
potential is sw^ept over tJie surface of die sample in nuich 
the same w^ay thai a television screen is scanned. As the 
liigh-energy electrons m the beam strike the sample, sevcrid 
valence electrons will be ** knocked loose" from the stmiple 
^is die impinging electrons lose energy. These no w« free elec- 
trons, or secondary electrons, find dieir way to tlie surface 
or the sample and are released I'wm the soi'face. A highly 
biased metal screen sittiated near the sample collects escap- 
mg secondary elections mto a detector which generates a 
signal proportional to the mmibcr of electrons collected. 
Tlie signal from the detector is amplified and displayed on a 
CRT screen which is scanned in synchromzatiou with the 
electron beam sweeping the sample. 
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Fig, 1, Sim 100 plot of 'the wall.'' 
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Fig» 2. (a) Fii'sl pass ""j^harge" of IC surface with secontiaiy pletit.rons, 
(1:i) Second pass "read" of charged surface (botrom). resulting video 
signal ( middle )t and 3D video image Ct«p). 



Voltage Contra.st liiiagiiig 

Voltage contrdst iiTiagiiig uyes the eiectricai nature (jf the 
SF-TM to \iew voltage jiot.eiitials on a sample changing with 
lime. Figs. 2a and 2b .show a cross section of lit e lop two 
nielaJ signal layers of an IC with the metal lines insulated by 
an oxide. 

The imaging is done in two stagers: charging and reading. 
Fig. iia 

.shows the state of the IC at ttie end of the cdmrging stage. 
The i>*Jsitive potential of the lnihe<i metal lines attracts ;m< I 
holds the generated secondaiy electrons on the surface of 
the oxide above lire nielaJ lines. Thesi* t^harges will remain 
on the surface for long periods of time, basically acting like 
a capacitor 

Fig, 2b shows the state of the IC at the cnid of the read stage 
with the voltage potentials of the metal liiies now changed. 
The rt^sulting detector sigital level an(i the C'RT image getier- 
ated h*<jni it are also shovm above the cross sect ion. As tJie 
electron lieam sweeps the surface of the sample, liie elec- 
trons that were once heUI Ijy dve positive charge of the u|i|>er- 
left mid lower metal lines (Fig. 2a) are knocker! off the sur- 
face and are collected into the detector, generating a bright 
signal on the CRT On the other hand, the upper-right metal 
line is now more pfjsitive, mid t.lie surface ab^ne it will rt^- 
lease fewer secondary electrons ^is the surfat.'e eapaciti vely 
charges^ corresponding to a lower number of electrons 
collected mid Ihus a darker signal on the CRT 



Fig, 3* Video image of DIJT fixture for vnltage contrast setup with 
tnp shield removed. 



DUT Preparation 

Preparing tire IC for the SEM environment required careful 
attention to several details as follows: 
■ Chan Power Environ metit. Some previous experiments 
indicated that the wall w as somewhat remedied by a powder 
ensiromuent that restricted the Vp|) current supply. There- 
fore, careful attention was jiaid to provide adequate low- 
inductance power feeds wnth adequate decouplitig 
capacit^mce. 

• Simple Vr^alorStmmitis. Restricted cat j ling into the SEM 
chamber and easy poitability between twxi different SEM 
facilities required a simple method for executing a wall- 
sensitive floating-point operation (FIjOP), A successful 
method was developed to launct; and step through the 
phases of a FLOP aslng the JTAGt-conforming serial test 
port j^uid a .serial test board. 

• Ifnafff* (■apiuff SipfrhnmizafhnL Tlie capture antl imaging 
of events on I be SlCM systent n^quires a synchronizitig signal 
generated by the device under test (DtJT). Several small 
surface mount IC's were niomited on the PA 7 UK JLC package 
It) decode the clock signals and derive another synctuoniz- 
ing sigiud to provide the SKM with an accurate syne pulse 

I hat itienlifiett the leading <'k»ck edge at the starting pi lase 
ofthefmlJugFLOR 

• Minimize Oulgtusing. To achieve an adequate vacuum m 

I he SFM system, materijils that had minimal outgassing were 
leqiiired. This prevented ilie use of heatshrink tubing and 
quick-core e|K>xtes and re(|ulred careful cleaning of ttie DtTT 

• Pftrkagitiij. The packaguig fixture containing the CPl' (see 
Fig. 3) iiwi several requirements. The wall was a high- 
temperature phenomennn and r*'quired heatijig tlu^ pari 
inside oi' tiie SEM with large^ resistors mrnnued inside the 
fixture. Ttie metal enclosure shielded all but the die surfajce 
from the electron beam, since the betun will positively 
charge jjlasfics (wiring, capacitors). 'Hie shield also pre- 
vented electrical sigttals in the Dl'T wiring from interfering 
witli the heant's iTj^jectocj. The last re^Juiremenl filled l)y the 
bxt Mring was a compact size to lit inside the small SEM 
f^harobet 



t J TAG i% the Jomt Test Actiofi Gfaiip, which developed IEEE standard 1 143 1 . iE£B fesi 
, Acv&s$ Port snd Bouni^afy-Scan AfEhiwcture, 
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BEAM BUNK 



Fig, 4, Beam blanldiig and sjiiclironiaatioii signal generation* 

Imaging Dynamic Signals 

The elpction lieani scan is synchronized with the scan of the 
\icleo display tuhe aiid consequently lias a slow refresh rate 
of 1/60 seconci. TiiLs slow refresh rate works well for station- 
ary objects and static electrical signals, but the signals of 
interest involved in the wall failure are dynamic. tyi>ical of 
mousetrap designs, Tlie imaging of dynamic signals required 
the development of a new process. 

Synchronization and Beam Blanking 

The slowest rate at which the DliT could be clocked with 
reliable operation of the scan path driven through the .ITAG 
port was 2.5 Mllz, giving a 200-ns phase or period diuing 
which dsmaniic signals would be at live. Connecting a i>iilse 
generator to the Dl T's sync pulse allowed the generation of 
a variable- width, variable-delay pulse (see Rg. 4) which was 
used to blank the electron beam scanning the DUT I'sing 
tills blanking signal, the SEM could be controlled to charge 
or read the IC only dining the time of interest when the wall- 
r elated signals were active. A 100-ns sample window was 
cliosen for the blank signal, wliit h w;is centered in the clock 
phase to reduce possible overlap into actjoinijig pliases. 

Once the beam was properly synchronized and blaidced, the 
api^arent lack of information in the \ideo image shown in 
Fig. 5 gave a strong indication that more development was 
needed. 

Image Capture 

The next problem to resolve was Imaging the brief 100-ns 
\ideo information successfully. Several ideas were evaluated 
and tried before an acceptable method was found: 

' Photographic Film IntegnUiOft, The SEM focuses the light 
from a secondar>' CRT onto the film plane of a Polaroitl 
camera over a period of several minutes while exercising 
the Dl rr. This method residted in either completely black oi* 
veiy uidistii^ct images of Oie IC. 

' Tiuo-Dimensioval Scan. The SEM can operate with basically 
a zero-frequency vertical scan rate. This pio\ades an image of 
a suigle horizontal slice of tlie IC surface* wliile improving the 
refresh rate. Changes in beam intensity were imdiscemible 
in this mode. 



• 'Pino-Dimetmonal Scan m OsciUoscope Mode. Using the 
same two-dunensional scan mode as above, the intensity 
vector of the SEM's display can be used to drive the vertical 
component of riie \ideo signal. Tlie resultuig image is remi- 
niscent of an oscilloscope display showing intensity on the 
y axis. No discernible changes in mtensity were visible in 
this mode as well. 

• Two-Step Charge/Read- Instead of trying to charge and read 
on each or every other FLOP, the process was broken into 
two stei)S. Tlie first step involved turning the beam or^ only 
fill ring the phase of interest while IJie pml was executing 
wall FLOPs o%^er a period of dnee minutes. A long integra- 
tion time was required because each time tiie beam \ timed 
on it only charged a tiny area of tiie field of vievv^ At tlie end 
of the integration time, the beam was turned off. the IC 
powered down, and the beam blank removed from tiie SEM 
The IC [low had a surface charge that reflected the state of 
tlie metal lines dmin^ the phase of interest. Tlie second step 
was to turn the betmi on with no blanking to read the sur- 
face charge in its first ptiss over the IC. The resulting \ideo 
image was clear but brief (one video frame). Tius process 
produced an image in which metal lines with a positive volt- 
age were white and metal lines at ground were black. An- 
other small variation in this process was not to power down 
the part before the read step. The resulting image took a 
httle more thought to inierjiiret because only the metal lines 
that changed state from tlie previous step were blac-k or 
white. 

• Two-Step Chafge/Read witJi VCR Pniine Capture. By ad<.i- 
ing a VCR to the setup, the resulting \ideo image fed lo the 
CRT could be captured on tape and then freeze-framed for 
viewing. The pmchase of a V^CR with a forward auci reverse 
single-frame jog shuttle control greatly aided in isolating the 
image captured on a single frame. It was appaien! from the 
videotape that the m^ority of llie ICs surface charge was 
removed in the first sweep of the beam across the die area. 
This last methodology was used successfnily for iniaging 
tlie dynamic signals m the FPALLl 

Results 

Once the methodology^ was established, over 120 images 
were capttnerl and catalogue<j on video tape over a four- 
week peiiofi. Several days were spent at the outset trving to 
understiind why an active clock hue m the inuiged phase 




Fig, 5. Video iintige of tlie lii*st-pass Inuifiinj? atteiiTpta. 
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Fig. 6. 2erti_topH niousetrap buffer 

was not showing activity-, a key indicator that the proper 
phase of the FLOP was being captured. This issue wtis never 
satLsfactorily resolved, yet phase-by-phase clock gating in 
the FPALtJ ensured that the signals would only be active 
and thus \4sible in the phase of interest. 

Figs, 6, 7a, and 7b show the schematic, artwork, and voltage 
contrast image of probably the clearest failure identified. 
The circuit in Fig. 6 shows a mousetrap buffer whose stor- 
age node, sL was somehow being coinpronnsed, possibly 
through a grom\d differential problem or a noise spike on 
the input. 
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Fig* 7* (a) Metal 3 plot of Zer&_iopH buffer with failing input/outpLU 
pair A. (b) Voltiige cnntrasl image of vicUiniaed buffer wth failing 
ininuyoutput pair A. 



Circle A in Fig. 7a identifies the buffers input on the left and 

the output on the right. The expected ^'alue of each metal 3 
line is mdicated above the lines (L=Low. H=BiighJ. 

Fig. 7b shows a voltage contrast image captured from the 
^ideot^e showing the failure of the buffer. The image 

dearly shows a low level on the input fblackj and a high 
ie%'el r\^'li!te) on the output of ihe buUer in circle A. Note lite 
differer»ce between circle A and circle B which identifies the 
input and output of an identic^y buffer mih no Mlnres. It 
became clear from this picmre dial the electrical e\'ent that 
caused die buffer to outi^ut a higlt level was transitorj^ m 
nature and not a static event. Tlie read step of the image 
was taken with the IC powered down, 

MetaJ I and even metal 2 lines can be difficttlt to image unless 
they are well-isolated from otlier me ml stnictures. Fig. 8a 
shows the artw^ork and expected vakies v^here several metal 
I lines were imagc>d. Tlie vertical n\et-iil 1 route in circle A 
should have a high or white level and the route to the right 
of it in circle B should have a low or black level 

Fig. Sl> is the volt^^e contrast image showing the logical 
misfiring (high/white) of the metal I route in circle B. This 




(a) 




(br 

Fig. 8. (a) FSIABCDj bus artwork. Cb) Voltage contrast imafje of fS[AfiGOI 

lnjii in m^Xiil 1 .shov^itig (M;*rrect firing of thf' tirips \n c\rv]e A and the 
incorrffji firing of lines in rirek* B. 
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Fig. 9, (aj Motal :;i strurturt^( vertical rciutiiigj In pa^,^mg statn at ivoniiiial voltage. Horizontfil routes are nif-M ^a] : 
Ming state at high Voltage with wall failure. 



(hj Metal 3 structurt" in 



failure was not seen until the root cause af the wall was 
identified and the proper FLOP for arming the failure was 
identified. 

The logical states of individiial lines of dense bus structures 
in lower metal levels can be difllcult to discern, yet differ- 
ences betw^een two states can often be readily identified. 

Figs. 9a and 9b ilhistrate llie differencing technique with an 
exinnple of a n^etal 3 structure in l>oth a passing and faihng 
state (note the diflereitces jji the venically routed lnies in 
the top"Center (jf the figures). The bend or distortion in Fig, 
9b is the result of poor synchronization between the SEM 
and the VCR that recorded tlie images. Note also die 
chiuiges in t ho horizontally routed metal 2 hncs. 

One technique that greatly aided the interpretation of the 
ca|:jtured images was to plot the ait work of the areas bemg 
imaged and annotate die plots with the expected logical 
levels as derived from a simulator. 

Improvements and Future Use 

It is difficult to detennine if E-bcam probing would have 
pro\ided quicker, more pertinent information than voltage 
contrast. Each tool has its own benefits ^mtl drawbacks that 
the IC designer must weigh in hght of the problem to be 
solved. 

Additional IC physical structures and layouts could make 
new^ designs more aoienaljle to voltage contrast imaging as 
well as E-beam probing and FIB experunents. Tliese features 
could pro\ide regular, systematic, top-le\'el-metal access to 
rontrol and data path signals throughout the design. Top- 
level-metal access coukJ lie pro\ided through directed routing 
or through **\ia stacks" to top layers ftom low^er-level metal 
routes. Tlie efficiency of such features in teiTus of improved 
accessibility vei^ns increased layout ai'ca is mdoiowTi. 



Tlie image quahty obtained from the SEM for voltage contrast 
work could be improved hiy changing tlie electron gun fila- 
ment from tmigsten to a crystalline element. The ciystalline 
filament would increase the i>eam currerit and thus effec- 
tively provide a brighter image witliout increasing the beam 
energy whit h reduces resolution. 

Conclusions 

The use of voltage contrast imaging proved to be a useful 
tool for imalyzing ami verifyhig tlie FPALIJ margin failure 
known as the wait Althougli the information gleaned from 
the process did not lead directly to the discovery of the root 
cause of the falhire, the voltage condast process functioned 
well as a clue generator as suggested In reference 3 and pro- 
vided impoitant confirmation of the root cause hypothesis. 
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Component and System Level Design- 
for-Testability Features Implemented 
in a Family of Workstation Products 

Faced with testing over twenty new ASIC components going into four 
different workstation and multiuser computer models, designers formed a 
team that developed a common system-level design-foNestability (DFT) 
architecture so that subsystem parts could be shared without affecting 
the manufacturing test flow. 

by Bulent L Denisoglu and Michael Ricchetti 



Members of the latest^eneration family of IIP workstation 
Linrl muitiiLser computer products tise the same system archi- 
tect lire and differ mostly in their I/O subsystem architecture 
and conligm-atioii. From a system development point of view 
an iinpoilant characteristic of these prmlticts is their use of 
a lunv high-i>peeci system l)us archiiefinre and a large num- 
ber [over 20) of new ASIC romi)onents that were developed 
to rmj>lement all of the various different configurations of 
tht* product line. Furthemiore, all components that interface 
with eat:b other \ia tlie system bus are reqtiircd to f>pera1 1* 
with the sajue high-frequency system clock. 

A fur I her rliflKruky was that four different models, ranging 
fnun a single-iiser riesktop workstatioo to a multiuser com- 
puter, were beiug develo|K^d by different desigti teaius that 
were both organi/.aHonaJIy antl geographically separatefl 
frtjui each other Tliis matle it net^essary to develop a coin- 
nton systeni-k^vt^l flesign-for-lestabilily (DFT) architecture to 
be used throughout the system and across the ditTc^rent. com- 
puter models so that subsy stent pmts coukJ be shared 
among the different computer models without affecting the 
inanitfacturing test flow\ 

T(j address these difficullies a DFT core team was formed at 
the very early stages of the j>rojec't. Beeause of the large 
niunber of different ASIC teatns invtjlved, it was decided 
that all ASK' teatns at tjie smue site woukj be represented by 
a singk* represeutative on the DFT core tefmi. TJiis team has 
been iiistniniet)ta! iu achievii^g goal c-ongruence among the 
thfferent flesign teams luid matiufactuiingorganizatioits. 
Fiirlhennore, the presence of die DFT con* team mafh^ il 
possible to develop and implement a DFT methodolo^' tliat 
was used by all of the ASIC teams, id the ugh the level of ad- 
herence vaiieck The DPf core team also collect e<:l data and 
performed DFT design reiiew^s for some of die ASICs. 

ASIC DFT Design Rules and Gmdelines 

One of liie llrs! acl ivifjes ot (he \)FT core team was to dt*- 
velop a set of design ndes atid guidelines to be foUowed by 
the ASK' design teams to enstue that DFT features woukl be 
eonin^jn anujug the viuious coniponent,s. This ujade it pos- 
si 1)1 e to share efforts and results and to access live different 



DFT features in the ASICs duriitg prototype system bring-up. 
Tlte following is a summary of these rules, ^ 

1. All (fuHctional) sijstem rifx'ks masi bf^dimctly control- 
lable Jmni lite chip pins fntil ftufHl ttot he used for anj^ other 
fun€twm All systems use a conimon ASIC component (the 
system cUx'k coturoller ASIC) to drive their clock terminals 
on the system hoard. This ASIC has control puis through 
u'hich it can he programmed ftu' thfferent clock genemtion 
sc:hemes as weU as for starting antl halting the system c:lo<^ks. 
Thas, not only the indi\idtial ASICs but also die entire system 
board has direcdy controllable clocks. 

2* .4// sran find test cloeks must be direvthj ciynhviiubte 
Jhim fl/e rofn/fOiif^nt pins, wit kit ttnist not be u.wdfiirmiy 
other put'pose. On the system l)oard all test clocks ai*e lied 
together and controlled from a single lest point. 

3. For each ASIC there is a specific reset, state iiMch is 
entered wlien llw mmponmiVs ARESFT.L slgnat is assertfd. 
On the system board, the power-on condition is deic^ctetl 
and is usetl to reset tite ASICs to a known starting state. 
Next^ (he irietnory controller ASIC generates an SR£SET_L 
signal to all other components on the system bus. Additional 
reset signaJs are generated by other ASICs for use locally. 

4. All ASICs rriuM imptemmit a dedicated boundary scmi 
registfrr find its assocffited test ogress poH (TAP) ns specified 
in IEEE 1149.1 Standard Test Access Fort and Boiindant 
Scan Arch Kerf tire.-' Signal scan-in and scan-out p*n1s of alt 
ASICs in Lhc system (inehichJig the PA 72tK) processor, which 
is on a separate module) are connected to fomi a siivgle 
serial scan chain. 

5* Acres fi to cacti .ASIC's on-chip test fit nrt ions must be pm- 
vi/M it sing ttie IEEE llW.l test access jxjii (TAP) pndoml 
The .same TAP controller design"* is used or heavily levtN - 
agetl ill niatiy ASICs. Titia way, test feattires implemeined in 
Uiis controller as an extension to the IEEE 1149. 1 standard 
were easily leveraged across difftTcnt ASIC's. For examplt\ 
the DR[VE_INHfBfT/DRE\/E„ENABLE inst rue lions and the OUT.OFF 
bit in the boundEiry scan register (set* "^TAF/S^VI* Controller,*' 
below) are duphcated in differetn, ASICs in this way 
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6. Ml ASICs shaii be designed lo support Idjjq lesUng. 
wfienmm' fJii^ w not preuenM by Ikr^ techuologi/ iLsmi. In 
most cases this requireoieru did no! present any further de- 
sign constraints or cfiaiiges, hi a few cases, an internal 
^DDQ enable signal had to be used to disable active pull-up 
and pull-down circuits. However, because of schedule and 
cost ronsitieralions the PA 7200 processor chip does not 
support looQ^t^i^tiM^' 

7* All ASICs shall impleinmit intermd sninfor Imlhtg. Tlw 
]fmx'entage ofinleiiial nodes that aw scannable skaU be 
kept as high as possibie without sacrificing nmijorchip 
m^a or olherwi^e aJJecHng the design metkodohgy. For 
most practical puiposes all ASICs liave implemented inter- 
nal scan for 10 W or nearly IflO^j of all intenial flip-flops. 
However, because of design style £iiid Lcclmolog^- dhTer- 
ences, some portions of the PA 7200 processor chip aie not 
scaimable. 

8. Hiere shall be no asynchronous logic implefnerUed in Ihf' 
ASICs. Lack of asynchronous Ifigic is an important require- 
ment for many CAD tools for generating lest vectors. Fiu"- 
thermore, thLs rule is intended to prevent side effects caused 
by changing the internal and exteniaJ signals in arbitrar>^ 
sequence. The only exception to this niie is graiUed lor Hie 
reset signals, which are implemented to tollow a carefully 
planned system reset strategy. 

The following sections describe some of the DFT features 
that have been implemented in the ASIC's. Not all features 
are implemented m aU ASICs. Among tJie various ASICs, the 
memorj'^ controller stands out as the chip with the most 
extensive DFT features. 

TAP/SAP ControOer 

Access to all on-chip DFT features is implemented tlu-ough a 
test controller block called the test access port/scan access 
port fT/\P/SAP). The test controller implements 'A) of the 
required instmctions for the IEEE 1149.1 TAP controller as 
well as an extensive set of public and jjilvate instnictions 
which are tcU'geted mostly fur internal testing of the ASIC. 
Table I lists all of the TAP instnuliuns that are implemented. 
Among the public instructions lliat have been implemented 
are the DflEVEJNHIBITand DR(VE^EWABLE iristiuctions wliich 
are used to set and clem' a latch in the system logic domam 
(not considered part of tlie test logic). 

System logic for all ASICs has been designed such that for 
normal system operation (i.e., when test logic is not con- 
trolling the I/O pins) the ASIC can diive out only if the 
DRIVEJNHIBIT latch IS cleared Each ASIC uses its ARESET_L 
input to cleai' the DRIVE_1NHIB1T Uiicti during power-up. 
^\Tiereas ARESET_t controls the DRIVE Jf^JHIBIT latch only if the 
TAP is in a reset stale, exi^hcit TAP instructions can be used 
at other times to set or clear this latch. This scheme iillows 
in-circuit ATE programs to set the DRI\/E_1NHIB[T latch before 
they temunate and reset the TAP without creating possible 
boarci-level bus contention before removing electric power 
from tlie boai"d. Wliereas tlie DRiVE^INHIBlT iatch is consid- 
ered part of the on-cJiip system logic, it is uuplemented as 
part of the T.^ controller design so that ASIC designers 
implementing normal system functions do not have to deal 
v^ith miy of tlie issues surrounding the DRIVE^INHIBIT and 
DRIVE_EI\iABLE operations. 





Table 1 






TAP Instructions 




Instruction 


Drive I/O Pads 


Scan Register 


EXTEST 


Boundary Register 


Bomidary 


BYPASS 


System Logic 


B^^pass 


SAMPtE/PRELOAD 


System Logic 


BoLmdaiy 


IDCODE 


System I^ogk 


ID Code 


HI_Z 


Hi gli- hup e dance 


Bypass 


DRfVEJMHIBIT 


Boundary Register 


Bypass 


DfllVE.ENABtE 


Systen^ Logic 


Bypass 


SCANJNTERNAt 


Sysleny Logic 


f(Mode) 


CHJPTEST 


I h gh-I nipe dance 


f(Morie) 


INTEST 


Boundaiy Register 


Boundary 


DR_SCAN 


System Logic 


f(Mode) 


SELECT_MQDE 


Boundary' Register 


Mfxle 


SET_MQDE_EIT 


Boundary Register 


Mode 


CIR_MDDE_BIT 


B^jujuhuy Register 


Mode 


ISAMPLE 


System Logic 


Bypass 


ESAMPLE 


System Logic 


Bypass 


DS_DRIVE 


Boimdary Register 


Boundary 


DS_RECEIVE 


System Logic 


Boundary 



Other TAP insti'uctions are tised to set aiul clear tuts of the 
mode register to provide access to additional test leaf tires 
such as IpoQ testuig, double-strobe, and so on. It is also pos- 
sible to speed up uiteiTial scan operations by switchii\g on the 
parallel scan bit in the mode register. This feat ure enables 
mulliplexing of the chip's VO pins to peifonii serial scan-in 
and scaji-out of the internal scan register by breaking it into 
three independent sections which are scanned m parallel 
togetJier wit h I lie boundary register, wliich is always scanned 
using the test data in and test data out pins of the TAP. 

CHtPTEST Instruction 

Oiie of the nu^jor difficulties in implementing DFT in the 
ASICs used for this project has resulted from a common 
leveraged I/O pad design that contauis noiiscaiuiable latches. 
Fiutliemiore, tire bidirectional [/O cell implements an mtemal 
bypass path to feed into tlie chip the saitve value that is beuig 
driven onto the l/i) pnd Ijy (hat chip. In effect, 1/0 pads con- 
tain nonscmmable pipeline stages I hat control both the di- 
rection and the value of data on the UO pad. Folio whig a 
recommendation from the DFT core team the basic I/O cell 
design w^as modified to aUow- data values received by tfie 
on-chip system logic to be set up using tlie dedicated botmd- 
arj^ scan register, hi addition, system logic otitput values can 
be captured into the boimdary scan register usmg the system 
clock. These design changes were coupled with features 
provided by tlie CHIPTEST instmction in the TAP controller to 
streamline the irUemal testing of the ASICs. For example, all 
internal logic of the memory subsystem ASICs (memorj^ 
controller, slave memory* controHer, and data multiplexer) is 
tested by the following setiuence: 

1. Load the CHIPTEST opcode into the .4SIC. 

2. Use test clocks to perform a pai-allel scan of the ASIC 
intemal nodes and the boimdai^' register. At the end of the 
scaji-in process the newly scmmed-in values lU'e automati- 
cally moved from tlie boundar>^ register to the nonscaimable 
latches in the l/O-cells, 
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3. Apply a single sj'stem clock to capture test results in in- 
lema! nodes and sj^em logic output \^ues in the boundai>^ 
registen 

4. Repeal steps 2 and 3 for each new v^ector, overlapping the 

scan-in and scan-out operations. 

Since the CHfPTT ST instruction drives the I O piiis to a higli- 
impedance slate it is possible ( indeed it is intended J to exe- 
cute these tests on a i>opuiaied s>'steni board without fear of 
creating board-level bus clashes during such testing. 

BrST Implementaiion 

Tlie inemoiy controller ASIC iitcorporates several wide and 
shalloTiv register files tliat are used for queueing operations 
within the data paths. The total number of storage elements 
in the register files is quite large, so it was not practical to 
make these storage elements scannable- tlierefore, a built-in 
storage test (BIST) approach was chosen to test the memory 
roiUJoUer data path register flies. 

The memory controller BIST implementation was developed 
with the following objectives: 

• Provide higli coverage m\6 short test times. 

• Prtn ide at-speed testing of the register file slmctures to 
ensure that I he inenior>- controller ASIC works at the re- 
quired system c^lock frequency. 

• Pro\ide flexihiltty and pr()gratnmabtliiy in ihe BIST logic to 
allow alteration of the test seqiienee for deliug and unfore- 
seen failure modes. In particular, tlie system bring- tip and 
debug plans provide a meaits for system-level scait access 
to the state within the ASICs. Pro\iding these featmes 
alkjws read/wTite access to the nonscannable queue states 
for prolotyjie system debug. 

• Provide for lestiiliility of the logic surrounding the register 
files through atided observation and rontrol points at the 
inputs tind outputs of the register file blocks. This is intended 
to support automatic test pattern generation (NTPG) tools 
used to generate test vectors for tht* inemor>^ controller and 
thus ensure high coverage of the standard cell control logic 
for the queues. 

The design of the BIST logic in the memory controller data 
pattis is based on previous work that was done* f(jr the PA 
7100-ljased IIP IIOOO Mtxlel 710 workstation. For tliat product, 
a structure indepejtdent llAM BIST airhitectiu'e that uses a 
pseudoexliau stive test algorithm at id signature an<ilysis was 
develojied and was itnplemented in the I/O controller ASIC;^ 
The stnicture indepencient, pseudoexhatistive test idgorithm 
provides 99.tl*iS{» fault coverage of typical RAM faults and can 
provide 80^ to 99.9% coverage of neighborhood pattern- 
sensitive faults. It also allows the test time (nuniber of read/ 
write accesses pfT memory address) tf> be varied according 
to the desired fault coverage, BIST architectures for both 
the jjresent memor>f controller .\S1C luid the previous I/O 
Coritn>ller ASIC use a test Jtlgoritlnn suniliu to tliat described 
by Hit ter and Sc^hwair/' Using the system clock for OIST exe- 
cution, the RAM structiue can be testetl at the Jionnal system 
clock rate, thus providing at -speed testing of the RAiVI. 

A dual-port write/single-part read register file from the pres- 
ent memory controller data path, with test structures that 
provide hijih BIST and ATPfi support similar to the previous 
I/O controlk^r BIST architecture, is shown in Fig. 1 . The two 
write ports, A and B, cmi both be addressed *ind written 



independently. The single read port can also be addressed 
and read indei>endently of the A and B write ports. Thus, 
two write operauons and one read operation can all occur 
simultaneously for one to three re^ster locations, depending 
on the A, B, and read port addresses. 

Giv^en the dual-poned design of the meniorv^ control lex reg- 
ister files, it was ne(*essar>^ to extend the previous VO con- 
troller BIST architecture- to test a dual-{K>rte4l RA.M. Tlus 
me^mt tliat the niemorj^ controller BIST implementation 
shouid be able to test not only tlte simultaneous dual-wTite 
operations but also the vimous combinations of A/B v^iite 
and read o(>erattons to veiify tliat the port interactions are 
working correctly. For the duaJ-port regisier files m the 
nienu>r>^ controller such interactions include an internal 
byiJasiS when the read address is the same as either of the A 
or B write addresses and a B-port dominant write when the 
A and B write addresses are e<|iial. This dual-write BIST 
algorithm is described in reference 6, 

For the register file showji in Fig. 1 each of the BISTstruc- 
tm-es— LFSR (linear feedback sliift register), SHIFT, COUNT, 
and MJSR (multi-input signature register) — ^is dedicated to 
BIST Each register file also has its ov\7t dedicated program- 
mable BIST control queue for sequencing tiie BIST algoiithm. 
The BIST_MOOE signal enables the BISTfimctions and can be 
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Fig. 1. Ihj^il'puit register file ^vitli bidJtin storage tt^st (BIST) mxd 
;iuh.tnuiNr te^it patteni genenitimi ( ATPtfj ff^atures. Thi^ inpuis to 
the r etnntl embedded RAM sftnicttirf* an^ jircmrlerl by nmlLiplexing 
lietween the tKiniial sysleni villus ainl a BIST regiBten which m Im- 
pleniented as a Linear feedback shift register (LFSR). The output 
nuiltlplexer nmke.H it passible to eapture tlw outputs into a itiuhi- 
inpul signature register (MlSli) and tu send elliier thi* RAM outpuis 
or the UIBR contents to tlie rest of the 5y5t<?m, 
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controlled eil her by a pin on the chip or llirougli other test 
access logic such kis nn IEEE lUff.l TAl* (ont roller or 
Pll 19.2 SAP controller"'^ Ml olthe BIST registei-s are ijiipk^ 
niented in staiulaiti cell blocks separate from the data path 
register files. A detailed description of tlie metiiory controller 
BIST iniplen\entat loiT and operation, along with hardwaie 
overhead anfl test ccjverage, can be found in relererice 6. 

Test Tools 

Tlie following sections dest^ribe the tools and tests thai were 
nsed anti developed to test t he I hree memory subsystem 
ASIC's: the menioiy controller, the slave niemoiy controller, 
and the data multiplexm; 

Addscan. Fig. 2 shows the flow for scan synthesis. All three 
nieiiKJiy subsystem .ASICs were designed using a stmctured- 
custom design method.^ Each standiu'd cell block used the 
in-hoiise Addscan tool'* for scan insertion and the scan links 
between lilot ks w^ere comiectefi by hand in tiie lop-level 
netUst of the chip. Tlie internal scan order of each block is 
based on post i)lace and route information. 

Test Vector Generation. The test vectoi- generation flow^ is 
shown in Fig, 3. The ATPG IocjIs from Crosscheck, Inc. were 
nsed to general e most scan-based vectors. Some vectors 
w^ere hand-generated. A gate-level net list of I he chip, prior to 
Addscan scan insertion, Ls used to create aji ATFCi tlatabase 
for vector generation. A simihir data base is useil by design- 
ers to do Tim\'er static timijig andysis. Tlmver, a timing analy- 
sis lool from Aida, hic, is usetl aa a part of the test method- 
ology for tw^o piutioses. First, it allows the design to be 
checked for hold vioiarions on all pat f is to guai^antee that 
there will be no tinting violations even if ATPG vectors exer- 
cise jK:>n functional patlis in the design* Secondly, Timver 
critical paths rati be led back into ATPG in the fonii of a 
vectortcp file to generate double-strobe path delay vectors. 



Fig. 2, Adds rail tool flow. Add- 
st;an is an in -house .software too! 
for scan inseitioiL SynopKys is an 
autouiaiit; design synthesis tool 
from SytiopsyTi, Inc. 



The I'oUowing test vector sets were created for each of tiie 
memory subsystem ASICs: 

• Conthiiiity. C 'hecks foi' opens anfi shorts among the ESI) 
protection diodes. Prepai'ed manually. 

• Ringtest:^ Uses serial "flush" speed (total scan path delay) 
through the boundary scan register as a metisare of the IV 
process aiid verifies thai tlie part is \^ithin the six-sigma 
range. Generated manually in the form of a Cadence Verilog 
body file, 

• Dc. These tests use the boimdary scan ring to diive oul all 
ones or zeros for de parmnctric testing. Generated manually 
in the form of a Verilog body flic. 

• Leakage and tristate testing. Places the ASIC into a high- 
impedance state to allow testing the 1/0 pads for leakage. 
Generated manually in {}w form of a Verilog body file. 

• hiUQ- These vectors aie generated by ATPG and are used to 
l^erfori!! static Iddq test and measuremenl. 

• TAP Testa These are tests targeted at fujietioiial testing of 
the TAP controller Generated manually in Uie form of a 
Verilog body Uii^. 

• Clnptest . Tltese vecrtors arc generated by ATPG to test the 
core c:liip logic in from and on! to the boundary scan ritig 
using the TAP CHIPTEST instmclion. PO pad logic is pot fully 
test ed by chiptest vertex rs. 

• Pint est: These vectors aie generated by ATPG and will test 
the rentainuig faults (prmiarily in I lie I/O pad logic) that are 
not covered by the chiptest. 

• Bus Holden Purlher testing of the electrical characteristics 
of the bidirectional PO cells. Generated manually hi the 
form of a Verilog body file. 

• BIST BIST vectoi-s are only generated on the memory con- 
Ij'oiler. Tliese tests require only two seim vectoi-s, one each 
to set up the mitializai ion and test pijsses for BIST After 
that, a burst of system clocrks is applied to test the taiget 




Fig. 3. ATPG (automx^tic test 
pattern generation) aiid timing 
tools flow. Hmver is a timing anal- 
ysis tool from Alda, Inc. PLTSIM 
is an iivhouse fault simulator for 
test verification. Tlte 1{ i|>g vectors 
are generated by ATPti from 
Crosscheck Corp. 



110 April t9ftS Hewlett-Packard rrnurna] 



)Copr. 1949-1998 Hewlett-Packard Co. 



Fail 




IfCELcr.wgE 



Fig, 4» V^ector verification and 
tester iraiislaiion tools flow. TSSI 
is a tmi program g'^iwralhm tool 
frojn TSSI^ Inc. Aida represerits a 
suite i:^f lest toolji from Aif.la, Inc. 
LSIM is a special FET-level simu- 
lator. SIM ITS is a foniml ttinver- 
sion tool from Schlumbet^ger, 



blocks at speed. These vectors are generated using a Perl 
script tf> |>rodiiee a tst vector file. 

• Double-Stn jbe. These vectors are generated by ATPG based 
on Tlniver eritical paths and are used to pro\dde at-speed 
testing of tlie ASIC. ^"^1 

• Ac testing of 1/0 Paths. These are functional r,ests that test 
I he speed charac I eristics of (^riUcal I/O paths. (Icnerated 
tnaiiually only tf t testing, ilesign review, and chip t^haracter- 
ization results indicate a concern. 

• Process/Voltage, and Tenipcrahire fPVT) Block Test, Ciener- 
ated manually, this grotjp ol tests applies only to the slave 
memory* routroner chiti wltich uses a unitiue PVT block to 
C'onipensate Urr process, voltage, atui teinperatiire Vriria- 
tioas in a pjinitntlar L^O cell 

Vector and Test Logic Verification. Fig. 4 shows the flow for 
tesr vector vcnllcatioji usin^ Vt^rilog or LSIM (a special FET- 
level siuuilator). Test vector's fnnn AfPCi vim be dnectly 
t*onvei1e(l using TSSI (a tool for lesl t>rogiaiu generation 
from TSSI, Inc. ) into a contnianfl file ffjnnat and verified 
against a gate-level netlist in Vorilog or a FKT-level netlist 
using LSfM. Alternatively, test vectors can tie simulated 
using a Veniog body file. A body file is a wra[>per or test jig 
tJiat cau eit her be a test vector set itself ( [\aiid-genemted 
functional tests) or can run scajt-based ATPG vet-ion^ tising 
asciui mid clock settuence. 

The AT&T Tapdance tool was used for further verification 
of the* TAP logic before tape release of the AHlCs. Tapdance 
generates a set of IEEE! 1 149.1 compliance tests to verify 
standard TAP functionally. The Tapdance vectors were* 
convened usmg Perl scripts t hUo a Verilog force file iim] 
simulated f)n a gate-level Jietlist. 

Tester Formal Translation. Fig. 4 also shows the flow for trans- 
lation of verlors int(j a tester fonitab ! 'sing TSSI. vectors 
were fonnalte<l directly to IIP 82t)<M) tester format. To gel to 
the Schlitmberger SOOOO tester, vect;ors were first ftmnatled 

t Pari is a Nrgh-fEveE pfogfammmg language 



to LSIM and then passed through SIMITS, a format converter 
from Schlimiberger. 

ITsing a Verilog prognitnmijig kmguage interface that out|3uts 
a TSSI simulation even I foniiat file dump, vectors can also 
be Irajtslated from body files to one of the testers. 

System DFT Features 

The ttew systems have been designed to provide a method 
t(j access ASK' scan pat Its, both bomidaiy imd iniemal, at 
ihe system level This has two m^yor pun>oses. First, it pro- 
vldes a meai\s of accessing the tntenial slate of complex 
VLSI cotni)onent.s. This provides addllic»nal hardwmc slate 
mformation to desigtiers thai wouki lytncally Uv inaccessi- 
ble and can aid traditiouai protot>^)e bring-up and debug 
methods. Second, it provides the ability to do scan-based 
testing of board and system interconnect and internal scan 
testing of ASK's. 

Tile following lest and (Mmg features are provided by system 
sc/m access: 

• Ability to hall the system clocks aitd inteiTogate the uUemal 
scan state of the ASICs. 

• SinglcM-ycle debug of the system c^>re by halting the system 
clocks, interactive sc^iinning f>rthe inienial slate, aiul then 
starting or cycling the system clocks, 

• Brjardievel atid systent-level interconnect testing and inter- 
active debug using boundiiry scan. This includes testing 
connector'^ lietweeti two boards where hovmdaiy scannable 
buses cross the conne<rtor 

• Ability to test an ASIC while it is on the board ushig bound- 
at^' jind intertial scatt. This may include double strobe tests 
ajid numing(jn-ciili) BIST, ifsiippoiled by the .ASIC mider 
Lest. 

As part of the ovenill DVT rerjniremenis, srdl ASICs impltMnent 
die IEEE 11411.1 Standiu-d Test Ac< ess Porl and Boundar>'- 
Scan Architecture. This provides support for system-level 
scan access. In addition, key debug sujjporl feat tires are 
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incorporated into tlie system clock ronl roller chip to allow 
foi-hallirij^ ^md controlling the system cfocks. Further uifor- 
mation on system clock controller features *:*an be found in 
reference 12* 

Fig. 5 shows a diagram of the system-level scan access hard- 
ware. The Texas Instninunits PC -based Asset toolset is used 
as the interface to system scan. The Asset PC is connected 
to a sca:i atlapter board via the Asset interface pod. Tlie 
scan adapter board thea plugs onto tlie system board and 
provides control of tire system clock contioller features, the 
TAP controDer and system logic reset, the clock halt triggers, 
and the I/O device clock halt from the Asset software. The 
scan paths m tlie system i^re contlgured as a single serial 
scan chain witli optional system boai'ds implemented as 
dyiiajTiic scan patiis thai can be configured m j\sseL 

The Asset scan tools provide the following capabilities for 
system scan access; 

• Interactive control of scan path data and TAP crmt roller 
insi ructions with scan-bit name mapphig and packing and 
unpacking of scaji data 

• Macro scripting c^paljOities for combining several interactive 
operations mto a single macro command. Asset also accepts 
serial vector fonnat scan vectors for user-developed tests. 

• Specification of system scan path configuration for draamic 
scan paths and opiional boards, such as CPUs, memoiy 
extender boaids, and UO. 

• Scait path ijitegrity testing and bomidaiy scan mtercomiecl 
testing of intraJboai'd and Interboai'd nets. 

• Control of system clock halt, single-cycle stepping, and 
system and TAP reset. 
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Fig. 5. Bystem-level scan access. 

A.s.se! is a set of scan tools from 

Texas liistrimients, Inc- 

ResuJts and Conclusions 

The I)F*r teclmiques described above, wliich were cham- 
pioned by tlK^ DPT core team, were implemented in several 
different ASICs with varying degrees of adlierence lo the 
DFT rules mid methodolog,v In general, results oblainerl 
durmg prototype chip debug have shown a direct correlation 
betw^een the level of DFT implementation and tlie rapidity of 
test development, chip chai'acterization, and root-cause 
analysis. For example, while the tJiree memory subsystem 
ASICs were the last to reach tape release, tliese chips were 
the fii^t to reach the operational test release (OTR) and re- 
lease to manufacturing test (RTPT) checkjioints. Tlie avail- 
ability of high-quaiit>' and compreliensivc test sets for these 
chips enabled cliip chai'acterization efforts to be started 
riglit away. Funhennore, success in reacliingthe OTR 
checki:)oint made it possible to tranjsfer the task of lestmg 
prototyiDe chips (wliich tne used in the prototype systems) 
to the manufacrming engmeei^. This had a ver^- positive 
effect on resom^ces available to perfomi chip chaiacteriza- 
tlon. In tmn, successful completion of this step coupled with 
efforts of the RM) engineers to imj^rove test coverage en- 
abled the team to reach the RTPT milestone W'Cll before any 
of the other i\^ICs had reached Iheii' OTR checiqiomts. 

Tile Asset tool and its customized extensions provided a low- 
cost system scan access solution with llexible functionality 
and ease of use. As a commercial tool solution it cut down on 
development and maintenance costs compiired to developing 
a proprietary toolset and can be reused for future projects. 
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floating-point controller for the PA 7 1 COLC He also 
worked on the floating-point controller design for the 
PA 71 00 processor and earlier models of the HP 9DD0 
Series 700 computers. He has published two papers 
on superscalar PA-RISC processors, one m the 1992 
Compcon Proceedings and the other in the 1993 
Compcon Proceedings. Will curentfy works for Cyrix 
Corporation as a senior engineer invafved with CPU 
functional verihcation. He is married, has two daugh- 
ters and enjoys basketball. 
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Miak Bass 

Author's biography appeal's elsewhere in this section- 
Terry W.BIanchard 

An R&D manager at HP's 
Svsiems Technology Divi- 

■ :i- Terry Blanchaid is cur- 
r^mt^y responsible for core 
CPU control design and vert- 
fication. cache control de- 
sign, and the physical imple- 
mentation of the cache data 
path for a new PA-RISC CPU 
chip His primary responsibilities since |orning HP in 
1989 have been m the areas of presilicon and postsi" 
licon verification, chip modeling, simulation, emula- 
tmn, and tools liason with outside vendors ^ He has 
contributed to several PA-RISC CPU chip efforts, in- 
cfudmg the PCX. PCX-S. PA 71 GO. PA 7100LC. and 
subsequent processors. He attenrtert Brrgham young 
University from which he received a BS£C£ degree in 
1989, Terry is married, has Three young daughters. 
and enjoys classical and broadway music, camping, 
landscaping, woodworking, and spending time wrth 
his famify. He also leads a local Boy Scouts troop. 
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D. Douglas Jcseptison 




Doyg Jos^hson is an tedh- 
-^"tjuKir with the 
chra^logv DwSJ0i 



for future H? processors, fte 
was responsible far cirtuit 
tfesign. test meit;t>dDlogy. 

intrQ,iurj.:C"^ Qf new lest metHods. and test arrd eiec- 
trical char^cteriiition for the PA 7100tC p/ocessoi. 
Previous wofit at HP includes lest engmeenng and 
elm:tncal charactenzation for the PCX and PCX S pro 
cessors, whjch were The first CMOS PA -RISC designs 
created at HP He came tn HP in 1 383 at th^ Inte- 
grated Ciruutts Busif^ess 0(v<s30n. Born ^n Monmouth, 
llJif^ais, he attended the University of Iowa, where he 
received his SSEE degree in 1988. Before corrririg to 
HP, he was a sumrrper tntern at Lattice SemiconductDf 
where he wrjrketi on electncal character iiatson ul 
electrically erasable programmBhle logic devices, 
■Doug IS a memlser of the IEEE and is interested in 
custom VLSI cin:tjit design and design far testability. 
He has published a paper on PA 7tOOLC features anrl 
an article on microprocessor Iqoq testing. His work 
has resulted m a patent pendmg for the sample-on- 
ihe-fly tectinirtue descnbed m this issue. He enjoys 
backpackmg. rock climbing, and htgh-altitudemoon- 
tainserirtg 

Duncan Weir 

Duncan We if has been a 

member ot tfie technical 
^taff at Systems Tech no logy 
. lion since he joined HP 
■9B6 Born at Clark Air 
1 fz Base in the Phitip- 
[..jifis, he attended Washing- 
ton University from which he 
received BSEE and BSCS 
degrees in 1986. Previous HP accomplishments in- 
clude verification of the PCX. PCX-S, and PCXT pro- 
cessors. He IS currently responsible for verification 
of the chip that is ihe follow-on to the PA 710DLC 
prDcessor He worked on postsiJicon verification for 
the PA 7100LC processor His work has resulted in a 
patent pending for the hardware TLB miss handler on 
the PCXT processor He is interested in the use of 
random code generation to lest PA -RISC processors 
He IS married and says he is kept pretty busy enter- 
tf^ming and exercising his two dogs, 

OantelL Halperin 

A rtative of Osfe Ridge, Ten- 
v.ee, Dan Halperin at- 
Jsd the University of Ten- 
nessee from which be 
received a BSEE degree in 
1979, He continyed his 
studies at the University of 
Illinois, earning an MSEE m 
" 1981 and a PhD in 1384, He 
Is a membef QiWm technical staff at the Systems 






TechrHjit^ Dmskm and i^ preser^tty responsible fo« 
the physical im^len^fttation uf control blocks similar 
tQ fti^ nne^ descf ifa^ io this issue PmvtDus cont/ibu 
tir .de work on fiv« different ifesigns ot 

?^ drcfware Befoi^ cf?mi*^Q to 1^, he 

servexj as a « ^search and ft- ' ^nt at ttffi 

Unrversif,- 'i^f T^nnf^^see 3'^ 'v r^ liltnois 

Hersar- "^ 

puterait -^ ^ 

married, has iwu civiiarefi airJ ertjoys stiiung. ijtcycl- 
ing, skating, and electronic keyboards, 
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Thomas V. Spencer 

Born in Louisville. Kentucky 
and raised in southern Indi- 
ana, Tom Spencer attended 

Purdue University from 
which he received a BSEE 
degree in 1&B0artdan 
^^^^ MSEE in 1932. He came to 
t ^^ V J ^H HP's Desktop Computer Divi- 

'^^ ^^ sJon in 1982 and is now an 

R&D project manager with the Fort Collins system 
laboratory of the Systems Technology Division Pre- 
vious accomplishments at HP include design engi- 
neering for the HP 9000 Models B2& and 845 com- 
puter sy^stems He was also project manager for the 
LJ\SI chip used in the HP 9DQ0 Models 112, 71 5/64, 
715/80, and 715/100 He is currently working on the 
I/O' chip for the next generation workstation. He is 
married and the new father of triplets born m Novem- 
ber 1994 Before the triplets his hobbies included 
mountain biking, tenms, basketball and skiing. His 
new hobbies include changing diapers and feeding 
babies. 

Frank J. Lettang 

A development engineer 

with the Systems Technoh 
ngy Division, Frank Lettang 
lifjs been with HP since 
198D HeholdsaBSEECS 
degree from the University 
of Catifornia at Berkeley and 
an MSEE from Stanford Uni- 
versity, He worked on HP 
9OO0 Series BPO computer systems, the HP 900D 
Series 70Q workstations, and the general sysiem con- 
nect iGSC) and SCSI interfaces in the LAS1 chip He 
is presently workmg on a new family of workstations. 
He is a member of the IEEE and a member of the PCI 
electrical working grnup. Frank is interested m high- 
speed digital design, RF engineering, and signal pro- 
cessing, and has authored an HP Journal article 
about clocks for high-speed workstations Born m 
Lynn wood, California, he is married and has two chil- 
dren, He enjoys gardening, hiking, cycling, playing 
with his children, and participating in elementary 
school volunteer work. 





CijrtJs H. McAllister 

nember of the technical 
:f* 3! HPs Systems Tcch- 
D vision. Curtis 
.^sr joined HP in 1988 
after receivif>g a BSE de- 
gree from lEiwa Stale Uni- 
versity sn thst same year He 
is currently responstbte lor 
work m a bus convsrtef. 
which irKludes work on the bus interface desigrt ^d 
CAD tool support Fireviously. he worked on the HP 
9)00 f^Odels 715/64, 715,^, and 715./! DO work- 
station products. On the LASt chip be provided CAO 
tool support for several ASIC tools and designed the 
parallel f Centronics] mterf^e aod the interface from 
the LASI internal hus to the Intel 825B6 megaceil. 
Before the LAS! chip he worked on the graphics ASIC 
used on the CRX24 graphics adapter and in the 
Model 710 workstation. He also designed an HP 300 
physical DMA interface board for the Turbo\ffiX 
graphics accelerator Before joining HP, he dtd sum- 
mer internships at McDonnell Douglas and Texas 
Instruments, working on various hardware design and 
software programming projects. His work has re- 
sulted m a patent on using buffer pointers for direct 
memory access. He has published an article on frame 
buffer design Curtis is interested in VLSI design, 
computer architecture, and CAD tools, and is a mem- 
ber of the IEEE. He was born m M!_ Pleasant, towa 
and enjoys skiing, basketball, and day hikes. 



Anthony LRiccio 

Anthony Riccio came to HP 
in 1 976 at HP's Calculator 
Pru ducts Division, He is row 
a member of the technical 
staff at the Fort Collins syS' 
terns laboratory. He vves 
born in Norwalk. Connect i- 
cut. and attended the Roch- 
ester Institute of Technology 
from which he received a BSEE degree in 1975. He is 
presently working on the pmcessor board for the next 
generation of desktop workstations Previous HP con- 
tributions include hardware design on an E-beam 
project at HP Laboratories, VLSI design for PA-RISC 
CPU, cache, and memory chips, processor design for 
the HP 9000 Models 750. 71 2. and 71 S workstations, 
and design work on the LAS f chip His work has re- 
sulted in a patent related to a programmable maxi- 
mum/minrmum counter for perfortnance analysis of 
computer systems and he is a rrtember of the IEEE. 
Tony IS married and has two children He enjoys fam^ 
ijy outings, playmg and coaching hockey, coaching 
soccer, playing tennis, hiking, biking, sailing, photog- 
raphy. and reading. He also participates as a Chris- 
tian hockey camp counselof. 
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Joseph ROrth 

A member of the technical 
staff at HP's Systems Tech^ 
nofogyDivistoa JaeOrth 
[OinedHPinig&Z He was 
born in Jerome, (daho and 
attertded the University of 
Idaho from which he re- 
ceived a BSEE degree in 
1992. He 15 presently re- 
sponsioie 'ur enviranmental testing of ttie next gen- 
eration CPU chip. He worked on the LASI chip's PS/2 
interface for the proieci reported inthisissue. Heis 
interested m cfigital design and enj[)ys rock climbang, 
hockey, basketball, Softball, and Wking 

Brian K. Arnold 

Bnan Arnold was born in 
Canon City, Colorado and 
a trended Colorado State 
University tram which fie 
received a BSEE degree in 
1987. He is a member of the 
technical staff at the Sys- 
terris Techno logy Division 
and has been with HP since 
1989. He was a graphics hardware engineer for two 
different products and wrote low-leve[ graphics soft- 
ware tor the Model 42 5E workstation, He is currently 
working as a design engineer on the noKi generation 
PA-fllSC processor, which includes responsibility for 
three standard cell fibraries. Synopses and CelO tools 
support, and block design. He was responsible for the 
audio and telephony interface for the LASI chip. Be- 
fore coming to HP, he researched multilevel iogic, fast 
arithmetic algorithms, and design engineering on an 
MIMD super computer His work ai HP has resolted in 
a patent related to telephony interface denign. He is 
interested in computer architecture as weEl as design 
and IC tools. He has pofahshed several articles on the 
benefits of a srgned- binary number system and also 
on algorithms that perform fast rnultipEication and 
divtsion using a signed-binary representation of num- 
bers. He is married, has a four-month-old son, enioys 
ouldoor sports and recreation, and ieads a marriage 
mjnistry at his church. 
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Paul Martin 

Pan! Martin was born in 
Salt Lake Dty, Utah, He re- 
ceived a BSEE degree from 
the University of Colorado 
i:i9B2}andanMSEEfrGm 
Stanford University (1SB4J. 
He has worked on diagnostic 
software development for 
Ball Laboratories and map- 
ping software for the National Oceanic and Atmo- 
spheric Administration before ioining HP's Fort Collins 
System Division in 1985. He worked as a designer for 
the Sf^X and CHX graphics syslems and as a rnanager 
for microcode development for the TurboSRX graph- 
ics subsystem He was a [so a manager for graphics 
development for the Model 4Z5e workstatton. He 




worked on the development of the graphics chip for 
the Model 712 workstation. As an RSD manager in 
the graphics hardware laboratory at HPs Workstation 
Systems Division, he is presendy working on devel- 
oping the next generation single-chip graphics con- 
troller. He has coauthored a paper on the TurtoSRX 
graphics subsystem He enjoys mountain biking, ski- 
ing, and playing the guitar 
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Anthany C. Barkans 

Tony Barkans has been a 
'::'"'"ibeFof thetechnicial 
:l'f in the graphics hard- 
Vv'are laboratory at the Sys- 
tems Technology Division 
since he joined HP in 1988. 
His past HP contributions 
include work on .scan con- 
version chips used in ttie 
Turbo- VRX and HCRX48Z graphics subsystems and 
various other chips used in the CRX and HCRX graph- 
ics subsystems. He is currently responsible for inves- 
tigating graphics features for passible inclusion in 
future HP workstations. Before coming to HP Tony 
worked on designmg hardware for computer graphics 
workstations while at Applicon Incorporated, His 
work lias resulted in five U.S. patents, four U,S, pat- 
ents pending, and two internstronat patents pending, 
all relating to computer graphics hardware, including 
color recovery. He is a member of the IEEE and has 
published several technical articles related to com- 
puter graphics. He is married, has a son and a daugh' 
ter and enjoys everything from sfiort weekend family 
trips to the Colorado mountains to longer jourr^ys for 
snorkelmg in the warm Canbbean waters. 
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flubv B. Lee 

Ruby Lee serves as chief 
architect for HP's interdivi- 
sional multimedia arch dec* 
ture team and senior techni- 
cal coninbutor in the 
computer systems architec- 
ture group m HP's Computer 
Systems Organization. She 
is currently leading the HP- 
Intel muftjmedia task force and contmues to work on 
architecture evolution in the PA-RISC e?!tensiQns 
team, She is also a consulting associate professor of 
electrical engineering at Stanford University She 
jomed HP Laboratories in 1 98 1 and was one of the 
origjoaf architects of PA-RISC. She was the lead de- 
signer Qf the first CMOS PA-RISC processor chip and 
nianager of subsequent microprocessor designs. 
Ruby attenctetf Cornel! University, receiving a BA de- 
gree with distinction in computer science and com- 
parative literature She continued her studies at 
Stanford University, earning an MS in computer sci- 
ence and a PhD m electncal engmeenng in 1980. 
Before joining HP she served as assistant professor 
of electrical engineenng at Stanford University. Ruby 



designed the PA-RISC muliimedia enhancements and 
led the multimedia architecture team that coordi- 
nated vmk across four HP sites involved with real- 
time MPPG decoding in software. Her work has re- 
sulted in seven patents and several more pending in 
the areas of processor architecture, pipeline imple- 
mentations, branch optimizations, cache hints, as- 
sists architecture, and multimedia She has authored 
or coauthored approximately thirty papers on stib- 
jects related to PA-RISC architecture, multimedia, 
parallel pn^cessing. VLSI, and computer performar^ce. 
She is a member of the IEEE and the ACM Ruby is 
marned, has two children, and enjoys skiing, aero- 
bics, windsurfing, and classical music. 

John P. Beck 

John Beck is an engineer/scientist with the Work- 
station Technology Division. He joined HP when 
Apollo was acquired by HP in 1989. He has worked 
on digital video conferencing and the CRX graphics 
system Before coming to HP he was with Digital 
Equipment Corporation and Apollo where he worked 
on high-performance CPUs and workstations. His 
work has resulted in two patents pending related to 
parallel subword averaging and digitol video com- 
pression. He was a key contributor to d^e software 
MPEG project, and he did the initial software opti- 
mizatmns 

Joel Lamb 

Joel Lamb has been a mem- 
■ T of the technical staff at 
-^'s Systems Technaiogy 
Division since 19B5 whan he 
first joined HP Born in 
McCook, Me bras ka, he at- 
•endecf the University of Ne- 
iT3ska where he received 
nis BSEE degree in 1983 and 
his MSEE in 1 985 He is currently working on a new 
PA'RISC processor. He was responsible for imple- 
menting the multimedia instructions for the PA 
71 OQLC processor His previous work at HP includes 
work on the circuit design for several PA- RISC pro- 
cessors and design of an mteger ALU, integer shtfter, 
and integer data path He is a member of the IEEE 
and his work has resulted in four patents related to 
VLSI circuit designs. Joe[ is marned and has two 
sons. 

Keniteth E. Sev^rson 

A project manager with the 
Workstation Systems Divi- 
sion, Ken Severson rs pres- 
ently responsible for imag- 
ing software at the graphics 
software laboratory in ft. 
Collins, Colorado. He came 
to HP in 1989 as an Apo Ho 
employee working on 3D 
graphics. He was responsible for leading the group 
that worked on digital video for the MPEG decom- 
pression proiect in the multimedia lab in Chelmsford. 
Massachusetts- Born in Grand For^s. North Dakota, 
he attended the University of North Dakota where he. 
earned a BS degree in physics and mathematics in 
1974 and an MS m physics in 1977. Before working 
at Apollo Of HP he ^vorked on software development 
in computer-aided design and manufacturing et Si- 
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S. Paul Tucker 

Paul Tucker is a member of 

the technical staff m the 
graphics hartM'ari labors- 
tofv in HP s WorKsiation Sys- 
tems Otvfsion He JDtned 
HFs Graphics Technology 
DivisiDn in 1988 and is cur- 
rently responsible for wor1( 
■'^ on MPs next generation of 
3D graphics products. Hs received a BSEE degree 
11986) arid en MSEE 11987] from Oklahoma State Uni- 
versitv. and his worl? has resulted in a patent related 
to graphics memory organization His professional 
interests include graphics and digitai signal process- 
(ng. Bom in Tulsa. Oklahoma, he is married and en- 
joys music, scuba diving, cycling, skiing, goff, per- 
sonal computers and brewing bear 
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Arlen L Roesner 

Arlen Roesner was born in 
G1 endive, Montana and 
grew up in Moniana's Gla- 
cier Park area. He attended 
M on Ian a Stare University, 
earning a BS degree in me- 
chanical engineering with a 
mir>Ofm physics in 1984. He 
IS a product design engineer 
m the Fort Collins system laboratory of the Systems 
Technology Division where he is fesponsible for the 
machantcal tlesign of the next generation of worN- 
siations. With HP smce 1984. his previous work \n- 
ciudes mechanical design for several HP 9000 Series 
700 v^orkstations. He was responsible for the me- 
chanicat and tfiBrmal destgn of the Model 712 work- 
station deschhed in this issue His work has resulted 
in tWQ patents related to an EISA/fSA card mounting 
concept for rear- instrument access, and a mass-stor- 
age tray concept for rear-instrumenr access of multi- 
ple drives He is the author of a previous jQurnal ar- 
ticle on the machanrcal design of the HP 9000 
Models 720 and 730 workstations. He is married, has 
two children, and enjoys fishing, woodworking, and 
computers. He is an acttve participant in the Fort Col- 
Jins \^nevard Chnsuan Fellowship and the suicide 
task fon:e of Larimer Couniy, 
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y^ when he degan work at the 

^t^ HP Computer Support Divi- 

^^^^t^ - sien in Rose vi tie. CalifDmia. 

^^^™*^^ He received his flSEET de- 
gree from Cahfornia Slate Pofytechnic University at 
Pomona in 1978 and an MSECE (electrical and com- 
puter engineering] trum the University af California at 
Davis tn 1985 Currently responsible for midrange HP 
3000 and HP 9000 platfonns, Pennis provided project 
I^dership for the low-end business servers de- 
scfibed in this issue_ Previous contributions at HP 
include production engineermg at both the Computer 
Support Division and the Computer Systems Division 
on the HP 1000 and the HP 3000 Series 37 computer 
systems. He has also worked in different technical 
development capacities on several HP 3000 and HP 
9O0O computer systems Before |oining HP he worked 
at Odetics Inc. in Anaheim, California as a lab techni- 
cian. Computer Auto mattons Automatic Test Division 
in Irvine. California as a field support engineer, and 
Alpha Wlicrosy stems in Irvine, California as an R&D 
engineer He is a member of the IEEE and the ACM 
florn in Reading. Pennsytvania, he is married and has 
two sons His favonte hobby is to tinker in his garage 
(usually accompanied by both sons} He enjoys auto 
restoration {currently working on a 1934 Ford 5-wm- 
dow coupe), IS a licensed ham radio operator 
IKE6HLC), and is active in his church and involved 
with several bible study groups. He also serves as an 
assistant scout master for the Boy Scouts of America 

Gerard M. Enkerlin 

^^^^^ Gerard E nker 1 1 n wa s bo r n i n 
^^Kf^^ ^^^ ^^^^ Potosi, Mexico and 
^^^^^B attended the University of 
Li. ,T>^ Morth Dakota where he re- 
^' W ceived a BS degree m engi- 

^.- J neering management and 
^^ -^f^ electrical engineering in 
^^■^^^^^ 1986. He continued his slud- 
^^" ^M ifls at the University of 

Ie*(i^ ^i^iiMiiy ttn MS in manufactufing engineering 
in 1983, joining HP's Network Computer Manufactur- 
ing Operations that same year He is presently a man- 
ufaeturrng program engir^eer responsible for new 
product introduction for new low-end servers. Pre- 
vious experience at HP includes work as a product 
engineer, process engineer, and program manager for 
various business server projects. Before coming to HP 
he worked for Oeico Products of General Motors as a 
master scheduler and manufacturing engineer. He is 
mterested m manufacturing and distribution mode I - 
mg and manufacturing optimiialion analysis. Gerard 
is married and enjoys golf, soccer, skiing, traveling, 
reading, and movies. He is also fluent in Spanish. 
German, and English 
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■ s^mufabon. and 
the HP 30£X1 St.; .c. -^^^ .-_■ .. ... , .^... ibi systems She 
tf^s also a pri>|€Ct manager for the HP 30O0 Series 
922 to S58 and the HP 9000 Series 822 to 852 com- 
puter systems. Her responsibilities for the project 
reportey m tiiis i^ue were as project nianagerfor 
the HP 3M Series Sx7 and HP 9000 Series Ex5 conv 
puter systems. Karen is curreotly a project manager 
for future low-end multiuser server systems. She re- 
ceived a BSEH degree in computer engineering from 
Iowa State University in 1979 She authored an HP 
J ou ma I article on testability in 1982 Born in Iowa, 
she ts married and has a son and two golden retriev- 
ers She enjoys woodworking, playing soccer and 
basketball, and coachmg her son's soccer team. Ka- 
ren also participates in bringing computer and elec- 
tronics tech no logy mto her sons thpnj^rade classroom. 
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Eileen Keremitsis 

W ^^^ I EileenKeremttsts isapubli- 
f ^^^fl^ ■ cations consultant with Rhi- 
■ jceros Consulting of San 
iducisco. She received a BA 
degree in Latin American 
^ studies from the University 
of Cafiforma ai Berkeley in 
1973, an MA in history and 
economic development from 
Columbia University in 1976. and a PhD (Fullbrigbt 
Scholarf in Latin American history from Columbia 
University m 1 982. She worked with the HP Distrib- 
uted Smalltalk team from 1993 to 1994 creating all 
user documentation for the first three major releases 
and providing all learning products support ^ She has 
worked with object-oriented software for eight years 
and has been involved with technical publications frjr 
ten years She previously taught history at the Uni- 
versity of Maine and the Urviversity of North Carolina 
at Chariotte. She is a senior member of the Society 
for Technical Communication and a member of the 
ACM and the IEEE Computer Society 




Ian J. Fuller 

Ian Fuller is a marketing pro- 
gram manager at the Soft- 
ware Engmeenng Systems 
Division. He joined HP in 
mo with the Office Sys^ 
tems Division in Pine wood 
^M near Beadi ng , Eng la nd. He 

^^ ^"""^^^^ received a BS degree in 
^^ ^ " physical and computer 



"I 



)Copr. 1949-1998 Hewlett-Packard Co. 



Ajirit imrt tl4-wk'tr-PaL^karcl .Jtiumiil 1 1 7 



sciences from Oxford Polytechnic in 1978. As a prod^ 
yc! manager, Jan is presentfy engaged in consLfltmg 
and marketing for HP Distributed Smalltalk Previous 
contributions at HP include project management for 
HP Desk Manager, software engineering for the HP 
Cooperative Computing Center, and project manage- 
ment for HP NewWave. He was also a software ar- 
cf)iteci foi the Informatfon Architecture Group, mar- 
keting product manager at the Open Systems 
Software Division, and the HP representative an the 
Object Management Group's Lechojcas committee, 
Before foining HP Ian was with ITT Busmess Sysierr>s 
in London, England where he programmed computer 
message switch software. He is interested in distrib- 
uted systems and obj eel-oriented systems for busi- 
ness applications and has published two previous 
articles in the HP Journal concerning HP DeskMa- 
nager and NewWave. He was born in Go sport, Hamp- 
shire, England, is married, and enjoys travel (espe- 
cially to Asia) and photography. 
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fVlanny Yousefi 

n Manny Yousefi came to HPs 
Manufacturing Productivity 
Division in 1 988 and is now 
a project manager with the 
Professional Services Divi- 
sion in Mountain Vtew, Cali- 
fornia He was the project 
manager for development nf 
the Software Solution Sro- 
ker reported m this issue and is one of its coarchi- 
tects. He was also responsible for the negotiation 
and procurement of third -party product licenses 
needed for various Software Solution Broker product 
Features He received a BS degree in computer sci- 
ence from the California Polytechnic State UntversJty 
in 19B3. His previous accomplishments at HP include 
work as an architecture team n^ember for an object- 
oriented manufacturmtj application and wori^ on an 
object-oriented text management system Before 
coming to HP, Manny was wilJi U.S. Sprint as a senior 
software engineer and project manager and witti 
Memorex Corporeuon as a software engineer He is 
mterested in object-oriented development and meth- 
Ddologies and distributed client/server applications. 
In his free time, he enjoys gardening. 

Add Ghonermv 

Adel Ghoneimy is a free- 
lance software consultarn 
and has been working with 
HP's Professional Services 
Division since 1993. He is 
responsiblB for the architec- 
ture and the ongoing devel- 
opment of the Software 
--.-.^« Solution Broker reported in 
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